Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stolon in docker swarm breaks after upgrading docker 19.x to 20.x with multiple ips #835

Open
johannesboon opened this issue May 14, 2021 · 2 comments
Labels

Comments

@johannesboon
Copy link
Contributor

johannesboon commented May 14, 2021

What happened:

pg_proxy is switching multiple times per minute between 2 different IP-addresses (breaking any existing connections/transactions/queries) to the master keeper, after upgrading Docker from 19.03.13 to 20.10.6

Probably caused by:
moby/moby#39204

This sounds similar but is not directly related as the 2 IP-addresses are seen from the outside as well:
moby/moby#30963 (or at least the 2021 comment: moby/moby#30963 (comment) from https://github.com/pwFoo )

What you expected to happen:

  1. Not break my cluster with the reference configuration.
  2. Consistenly handle cases where multiple IP-addresses are available (ordered numerically, instead of random order from Docker DNS?)

How to reproduce it (as minimally and precisely as possible):

With our setup based on this example: https://github.com/sorintlab/stolon/blob/master/examples/swarm/docker-compose-pg.yml#L24 we also defined:

  • multiple keeper services
  • each with 1 instance
  • a hostname that is equal to the servicename

we also added a placement constraints, amongst others

Anything else we need to know?:

Environment:

  • Stolon version: 81425db-dirty
  • Stolon running environment (if useful to understand the bug): Oracle Linux 7 with kernel: 4.14.35-2047.503.1.el7uek.x86_64 (should be compatible with CentOS 7, only higher kernel version)
  • Others:

Main yum repositories involved:

Upgrading Docker CE from 19.x to 20.x involved these packages (I included anything I thought could be remotely related):

    Updated     bind-export-libs-32:9.11.4-26.P2.el7_9.4.x86_64          @ol7_latest
    Update                       32:9.11.4-26.P2.el7_9.5.x86_64          @ol7_latest
    Updated     bind-libs-32:9.11.4-26.P2.el7_9.4.x86_64                 @ol7_latest
    Update                32:9.11.4-26.P2.el7_9.5.x86_64                 @ol7_latest
    Updated     bind-libs-lite-32:9.11.4-26.P2.el7_9.4.x86_64            @ol7_latest
    Update                     32:9.11.4-26.P2.el7_9.5.x86_64            @ol7_latest
    Updated     container-selinux-2:2.77-5.el7.noarch                    @ol7_addons
    Update                        2:2.119.2-1.911c772.el7_8.noarch       @extras
    Updated     containerd.io-1.3.7-3.1.el7.x86_64                       @docker-ce-stable
    Update                    1.4.4-3.1.el7.x86_64                       @docker-ce-stable
    Updated     docker-ce-3:19.03.13-3.el7.x86_64                        @docker-ce-stable
    Update                3:20.10.6-3.el7.x86_64                         @docker-ce-stable
    Updated     docker-ce-cli-1:19.03.13-3.el7.x86_64                    @docker-ce-stable
    Update                    1:20.10.6-3.el7.x86_64                     @docker-ce-stable
    Dep-Install docker-ce-rootless-extras-20.10.6-3.el7.x86_64           @docker-ce-stable
    Dep-Install docker-scan-plugin-0.7.0-3.el7.x86_64                    @docker-ce-stable
    Updated     firewalld-filesystem-0.6.3-12.0.1.el7.noarch             @ol7_latest
    Update                           0.6.3-13.0.1.el7_9.noarch           @ol7_latest
    Dep-Install fuse-overlayfs-0.7.2-6.el7_8.x86_64                      @extras
    Dep-Install fuse3-libs-3.6.1-4.el7.x86_64                            @extras
    Erase       kernel-uek-4.14.35-2025.404.1.1.el7uek.x86_64            @ol7_UEKR5
    Install     kernel-uek-4.14.35-2047.503.1.el7uek.x86_64              @ol7_UEKR5
    Updated     kernel-uek-tools-4.14.35-2047.501.1.el7uek.x86_64        @ol7_UEKR5
    Update                       4.14.35-2047.503.1.el7uek.x86_64        @ol7_UEKR5
    Updated     lvm2-7:2.02.187-6.0.3.el7_9.3.x86_64                     @ol7_latest
    Update           7:2.02.187-6.0.3.el7_9.5.x86_64                     @ol7_latest
    Updated     lvm2-libs-7:2.02.187-6.0.3.el7_9.3.x86_64                @ol7_latest
    Update                7:2.02.187-6.0.3.el7_9.5.x86_64                @ol7_latest
    Updated     nss-3.53.1-3.el7_9.x86_64                                @ol7_latest
    Update          3.53.1-7.el7_9.x86_64                                @ol7_latest
    Updated     nss-sysinit-3.53.1-3.el7_9.x86_64                        @ol7_latest
    Update                  3.53.1-7.el7_9.x86_64                        @ol7_latest
    Updated     nss-tools-3.53.1-3.el7_9.x86_64                          @ol7_latest
    Update                3.53.1-7.el7_9.x86_64                          @ol7_latest
    Updated     selinux-policy-3.13.1-268.0.1.el7_9.2.noarch             @ol7_latest
    Update                     3.13.1-268.0.3.el7_9.2.noarch             @ol7_latest
    Updated     selinux-policy-targeted-3.13.1-268.0.1.el7_9.2.noarch    @ol7_latest
    Update                              3.13.1-268.0.3.el7_9.2.noarch    @ol7_latest
    Dep-Install slirp4netns-0.4.3-4.el7_8.x86_64                         @extras``` 
@johannesboon
Copy link
Contributor Author

Ah, this was also reported in: #826

@sgotti
Copy link
Member

sgotti commented Jul 15, 2021

@johannesboon I think the main issue is just in the docker example that uses hostnames instead of ips. When the dns like in this case becomes a round robin dns you get such issues. A working fix without using an advertising address like done in #836 could be to just use the ip of the container as listen address instead of the hostname (like done in the k8s example where we are using the pod ip).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants