Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

TTY allocation on node breaks if vsphere storage plugin is installed #2078

Open
raptaml opened this issue Mar 27, 2018 · 13 comments
Open

TTY allocation on node breaks if vsphere storage plugin is installed #2078

raptaml opened this issue Mar 27, 2018 · 13 comments

Comments

@raptaml
Copy link

raptaml commented Mar 27, 2018

If I install the latest version of vsphere-storage-for-docker via managed plugin, from that moment on TTY allocation breaks an makes remote shell access impossible.
I am using:

Ubuntu 16.04 LTS:
(4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux)

Docker:
Server:
Engine:
Version: 18.03.0-ce
API version: 1.37 (minimum version 1.12)
Go version: go1.9.4
Git commit: 0520e24
Built: Wed Mar 21 23:08:31 2018
OS/Arch: linux/amd64
Experimental: false

ESXi VIB:
esx-vmdkops-service 0.21.c420818-0.0.1 (via UM ZIP Bundle)

Docker plugin:
vsphere-storage-for-docker:latest

When I remove the plugin and reboot the node, everything starts working as normal. I hve also tried vsphere-storage-for-docker:0.20 and vsphere-storage-for-docker:0.19 which show the same behaviour.

/var/log/auth.log shows:
Mar 27 09:57:42 SERVERNAME sshd[2451]: error: openpty: No such file or directory
Mar 27 09:57:42 SERVERNAME sshd[2487]: error: session_pty_req: session 0 alloc failed

This is totally reproducable here.
Any ideas?

@govint
Copy link
Contributor

govint commented Mar 27, 2018

@raptaml thanks for letting us know of this issue but can you say how you are establishing the remote shell with the VM running the plugin. The plugin is but a process on the host and its not using TTYs either, only a socket to the ESX (vSockets). Does a reboot after installing the plugin help? This may have more to do with Docker perhaps than with the plugin.

@raptaml
Copy link
Author

raptaml commented Mar 27, 2018

@govint any remote shell will produce the error. I am trying to ssh into the VM from Linux but also Putty from Windows does not work. A reboot does not fix the problem. But uninstalling the plugin and then rebooting does.
Even immediatly after installing the plugin, a sudo command in the same remote shell gives me:
"no tty present and no askpass program specified"
I have to uninstall the plugin via local shell and reboot the server then.
strange...

@govint
Copy link
Contributor

govint commented Mar 29, 2018

Ok, but this is exactly how we use the plugin and don't see any issue. This is what I have on Ubuntu 14.04,

$ sudo docker plugin ls
[sudo] password for guest:
ID NAME DESCRIPTION ENABLED
56866816829b vsphere:latest VMWare vSphere Docker Volume plugin true
$ sudo docker version
Client:
Version: 18.03.0-ce
API version: 1.37
Go version: go1.9.4
Git commit: 0520e24
Built: Wed Mar 21 23:10:22 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarm

Server:
Engine:
Version: 18.03.0-ce
API version: 1.37 (minimum version 1.12)
Go version: go1.9.4
Git commit: 0520e24
Built: Wed Mar 21 23:08:52 2018
OS/Arch: linux/amd64
Experimental: false
$ id
uid=1001(guest) gid=1001(guest) groups=1001(guest),27(sudo)

I'm not able to see this behavior reported elsewhere either. Let me check

@johnbaker
Copy link

I have the same behavior on a fully updated Ubuntu server 16.04 install. As soon as I install the vsphere storage plugin, I experience the same behavior. After trying a few other plugins, I have found this isn't the only plugin that causes the behavior. For example, the Pure storage plugin (store/purestorage) also fails.

Not all storage plugins cause the issue, for instance, I can use the SSHFS plugin (vieux/sshfs)

@bartlenaerts
Copy link

I have the same problem on a freshly installed Centos7 server. I installed the latest version of docker (18.03.0-ce, build 0520e24), created a swarm with different nodes. No problems so far, I can still connect with SSH to the server. As soon as the vSphere plugin is installed, a connection to the server isn't possible anymore with SSH. Rebooting doesn't help. Only deinstalling of the plugin and then rebooting works.

@govint
Copy link
Contributor

govint commented Apr 10, 2018

Let me try the same config. With Ubuntu we have no issues with using the plugin. I haven't seen this with Alpine or even Centos with earlier docker versions either. Let me recheck and post.

@grekier
Copy link

grekier commented Apr 12, 2018

Same problem here on Ubuntu 16.04.4 LTS with kernel 4.4.0-119
I also noticed in the plugin log that there is a recurrent error: Failed to get volume meta-data name=XXX error="Volume XXX not found (file: /vmfs/volumes/MAIN-RAID5/dockvols/_DEFAULT/XXX.vmdk)"
Also worth mentioning is that I have a swarm with 3 master nodes
Don't know if it helps but thought I would add the info.
Same version of docker and plugin as above

@raptaml
Copy link
Author

raptaml commented Apr 17, 2018

@govint Do you have any update for us? Should we file a bug with the docker team or what would you suggest?

@ghost
Copy link

ghost commented Apr 17, 2018

Good Day,

I have the same issue and it has to do with devpts. When installing plugins a second devpts is mounted. The below is the varying options of the mounts :

rw,relatime,mode=600,ptxmode=000
rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666

The first works no issues, with the plugins the second option comes into play and that is when logins do not work unless using the console.

From the console I umount both devpts and the mount as follows :

mount devpts /dev/pts -t devpts

Regards

@grekier
Copy link

grekier commented Apr 18, 2018

I see almost the same as @Eireocean
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
second line inserted after plugin install.
Side note is that actually removing the last one fix the issue for me.

Not sure if it helps but it seems that something similar happened earlier in systemd (systemd/systemd#337)

@liqdfire
Copy link

I have the same issue with docker 18.03

I removed it, and re-installed 17.12.1-ce and had no issue re-installing the volume plugin.

@SaintMartin
Copy link

I also ran into the same problem. I was running a 6-node swarm (3 managers) on:
Docker Version: 18.03.0-ce

Kernel Version: 3.10.0-693.21.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
On ESXi 6.0 (VM version 11)

ESXi VIB:
VMWare_bootbank_esx-vmdkops-service_0.21.c420818-0.0.1.vib
Docker plugin:
vsphere-storage-for-docker:latest

On day, when I attempted to ssh into any of the nodes I got: “PTY allocation request failed on channel 0”
The solution was to login to the VMWare console, remove the volumes, disable the plugin, and reboot the VM.

BTW, the problem started around March 26. According to /var/log/vmware-vmsvc.1.log:
[Mar 26 13:42:59.026] [ warning] [guestinfo] GetDiskInfo: ERROR: Partition name buffer too small
[Mar 26 13:42:59.026] [ warning] [guestinfo] Failed to get disk info.
[Mar 26 13:43:29.024] [ warning] [guestinfo] GetDiskInfo: ERROR: Partition name buffer too small
[Mar 26 13:43:29.024] [ warning] [guestinfo] Failed to get disk info.
[Mar 26 13:43:59.024] [ warning] [guestinfo] GetDiskInfo: ERROR: Partition name buffer too small
[Mar 26 13:43:59.024] [ warning] [guestinfo] Failed to get disk info.
[Mar 26 13:44:29.024] [ warning] [guestinfo] GetDiskInfo: ERROR: Partition name buffer too small
[Mar 26 13:44:29.024] [ warning] [guestinfo] Failed to get disk info.
[Mar 26 13:44:59.025] [ warning] [guestinfo] GetDiskInfo: ERROR: Partition name buffer too small
[Mar 26 13:44:59.025] [ warning] [guestinfo] Failed to get disk info.
And so on

I hope that helps.

@grekier
Copy link

grekier commented May 11, 2018

Seems like upgrade to docker 18.03.1-ce is fixing this issue. At least I can SSH to my docker servers now with kernel 4.4.0-124

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants