Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test improvements #1931

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Test improvements #1931

wants to merge 4 commits into from

Conversation

faderskd
Copy link
Contributor

@faderskd faderskd commented Nov 25, 2024

The purpose

  1. I've tried to get rid of flaky integration/slowIntegration tests (running on CI/GA) where after creating a topic the publishing on that topic fails with TOPIC_NOT_EXIST (404) error.
  2. By the way of debugging the problem I found that the problems with indefinitely PENDING subscriptions after creating the subscription are caused by the same problem.

The problem

  1. We have the following test example
  @Test
  public void shouldPublishAndConsumeMessage() {
    // given
    TestSubscriber subscriber = subscribers.createSubscriber();
    Topic topic = hermes.initHelper().createTopic(topicWithRandomName().build());
    hermes
        .initHelper()
        .createSubscription(
            subscription(topic.getQualifiedName(), "subscription1", subscriber.getEndpoint())
                .build());
    TestMessage message = TestMessage.of("hello", "world");

    // when
    hermes.api().publishUntilSuccess(topic.getQualifiedName(), message.body());

    // then
    subscriber.waitUntilReceived(message.body());
  }
  1. It sometimes happens that despite the fact that the topic was created the publishing on that topic fails with 404 messsage.
hermes.api().publishUntilSuccess(topic.getQualifiedName(), message.body());
  1. The reason is that the topic was created by management and correctly saved to the Zookeeper, but the frontend was not notified via zookeeper watch.
  2. Hermes Frontend is watching any newly created topics by:
    • setting watch to the /hermes/groups path
    • when new group is created the frontend is notified and then it sets watch for: /hermes/groups/{groupName}/topics path
    • this is accomplished by the HierarchicalCache class
    • once the topic is created, the Zookeeper Server notifies frontend which is visible in logs
    • note that this event is also fired for the existing /hermes/groups, /hermes/groups/.../topics paths when we connecting to the zookeeper for the first time (otherwise we could not fetch already existing zookeeper entries). Thus we don't have any refreshing mechanism in place.
  3. The problems starts when something bad happens to the connections between frontend/consumers and zookeeper server so they don't receive notification via watches.
  4. Thus the fact of creating a topic is not noticable by frontend and we are getting 404 when publishing.
  5. For consumers module the lack of notifications results in subscriptions not being activated as the consumers are not creating a ConsumerProcess instance.

Analysis

  1. I've started by checking the hypothesis that the notification system is not working correctly by periodically refreshing all zookeeper resources and it indeed helped.
  2. Then I've disabled refreshing and implemented manual notifications by updating cache whenever a new topic was created. This also resulted in elimination of 404 errors while publishing.
  3. Then I've disabled the previous "quick fixes" and started digging into logs. I've enabled the following logging:
    • verbose mode for gradle
    • I've added module field to the cache so it will be easily visible which module (consumers/frontend) is getting/missing notification
    • enabled debug logging for zookeeper client and curator framework
    • enabled debug logging for zookeeper server container
    • I had to clear zookeeper data after each test because without it the logs were too messy (there were too many logs on zookeeper paths from previous tests)

Observations

  1. Our tests (with few exceptions) starts dependencies only once (kafka/zookeeper/...)
  2. Some of the tests require changes in properties, then the hermes integration "framework" detects that situation and restarts a changed module
  3. The notifications can be missed by a frontend/consumer modules right from the beginning.
  4. The notification to the module can work incorrectly, then after restarting another independent module, the previous notification can start working. For example, the consumer module app is not getting the notifications from zookeeper, then the management is restarted and suddenly the consumers starts receiving notifications without event reconnecting or restarting itself.
  5. Something weird happens on zookeeper container during module restarts so other connections from other modules start or stop receiving notifications without being restarted.
  6. With the logging described in analysis section I've dig into a few broken sessions and:
    • the client connecting to the zookeeper server is getting session id: org.apache.zookeeper.ClientCnxn - Session establishment complete on server localhost/[0:0:0:0:0:0:0:1]:32803, session id = 0x10000040c290002, negotiated timeout = 60000 (this is ok)
    • we can use this to follow the session logs on the client and on the server side
    • whenever the new topic is created the zookeeper will push the event to the frontend/consumers module and it will be visible in zookeeper client's logs: Got WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/hermes/groups/TopicBuilderGroup1/topics zxid: -1 for session id 0x1000004d7030002 (this is ok)
    • in the failed tests these events were not been received
    • for the broken (or temporarily broken) connections the client starts sending ping request and if the server is alive it will respond with pong response, which is visible in logs: org.apache.zookeeper.ClientCnxn - Got ping response for session id: 0x1000004d7030000 after 0ms.. The ping request/responses means that the connection is "alive" but there is no data sent from the server:
      The session is kept alive by requests sent by the client. If the session is idle for a period of time that would timeout the session, the client will send a PING request to keep the session alive. This PING request not only allows the ZooKeeper server to know that the client is still active, but it also allows the client to verify that its connection to the ZooKeeper server is still active. The timing of the PING is conservative enough to ensure reasonable time to detect a dead connection and reconnect to a new server..
  7. It was typical to see sequence of logs like:
    • org.apache.zookeeper.ClientCnxn - Session establishment complete on server localhost/[0:0:0:0:0:0:0:1]:32803, session id = 0x10000040c290002, negotiated timeout = 60000
    • then many logs Got ping response for session id: 0x10000040c290002 after 1ms.
    • many failed tests resulting in 404 TOPIC_NOT_EXISTS
    • then restart of another module like management
    • and suddenly logs
      2024-11-25T12:38:14.0432627Z [Test worker-SendThread(localhost:32803)] - 2024-11-25 12:38:14.019 DEBUG org.apache.zookeeper.ClientCnxn - Got notification session id: 0x10000040c290002
      2024-11-25T12:38:14.0436596Z [Test worker-SendThread(localhost:32803)] - 2024-11-25 12:38:14.019 DEBUG org.apache.zookeeper.ClientCnxn - Got WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/hermes/groups zxid: -1 for session id 0x10000040c290002
    • then no more problems caused by lack of notifications from zookeeper
    • then many passing tests cases
  8. Finally I found that the connections with notification problems lack of socket accept logs: DEBUG Accepted socket connection from /172.17.0.1:43038 (org.apache.zookeeper.server.NIOServerCnxnFactory). It is weird to not see such log because on the client side there is a clear evidence of connection establishment. Below are examples of broken and correct connection establishment process:
-------------------NOT WORKING----------------------
[Test worker] - 2024-11-25 12:45:07.548 INFO  org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=localhost:32803 sessionTimeout=10000000 watcher=org.apache.curator.ConnectionState@29f62163
2024-11-25T13:01:41.8060576Z     [Test worker] - 2024-11-25 12:45:07.549 INFO  o.apache.zookeeper.ClientCnxnSocket - jute.maxbuffer value is 1048575 Bytes
2024-11-25T13:01:41.8061208Z     [Test worker] - 2024-11-25 12:45:07.549 INFO  org.apache.zookeeper.ClientCnxn - zookeeper.request.timeout value is 0. feature enabled=false
2024-11-25T13:01:41.8061873Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 12:45:07.549 DEBUG o.a.zookeeper.SaslServerPrincipal - Canonicalized address to localhost
2024-11-25T13:01:41.8062671Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 12:45:07.549 INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/[0:0:0:0:0:0:0:1]:32803.
2024-11-25T13:01:41.8063620Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 12:45:07.549 INFO  org.apache.zookeeper.ClientCnxn - SASL config status: Will not attempt to authenticate using SASL (unknown error)
2024-11-25T13:01:41.8064722Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 12:45:07.550 INFO  org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /[0:0:0:0:0:0:0:1]:41276, server: localhost/[0:0:0:0:0:0:0:1]:32803
2024-11-25T13:01:41.8065539Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 12:45:07.550 DEBUG org.apache.zookeeper.ClientCnxn - Session establishment request sent on localhost/[0:0:0:0:0:0:0:1]:32803
2024-11-25T13:01:41.8065973Z     [Test worker] - 2024-11-25 12:45:07.551 INFO  o.a.c.f.imps.CuratorFrameworkImpl - Default schema
2024-11-25T13:01:41.8067049Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 12:45:07.551 INFO  org.apache.zookeeper.ClientCnxn - Session establishment complete on server localhost/[0:0:0:0:0:0:0:1]:32803, session id = 0x10000040c290007, negotiated timeout = 60000
2024-11-25T13:01:41.8067926Z     [Test worker-EventThread] - 2024-11-25 12:45:07.551 DEBUG org.apache.curator.ConnectionState - Negotiated session timeout: 60000
2024-11-25T13:01:41.8068496Z     [Test worker-EventThread] - 2024-11-25 12:45:07.551 INFO  o.a.c.f.state.ConnectionStateManager - State change: CONNECTED
2024-11-25T13:01:41.8070233Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 12:45:07.552 DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x10000040c290007, packet:: clientPath:/zookeeper/config serverPath:/zookeeper/config finished:false header:: 1,4  replyHeader:: 1,2842,0  request:: '/zookeeper/config,T  response:: ,s{0,0,0,0,0,0,-1,0,0,0,0}
2024-11-25T13:01:41.8070830Z     [Test worker] - 2024-11-25 12:45:07.558 INFO  p.a.t.h.i.d.DefaultDatacenterNameProvider - Providing default datacenter name: dc
2024-11-25T13:01:41.8071390Z     [Test worker] - 2024-11-25 12:45:07.558 INFO  p.a.t.h.i.d.DefaultDatacenterNameProvider - Providing default datacenter name: dc
2024-11-25T13:01:41.8071969Z     [Test worker-EventThread] - 2024-11-25 12:45:07.558 DEBUG o.a.c.f.imps.CuratorFrameworkImpl - Clearing sleep for 1 operations
2024-11-25T13:01:41.8072544Z     [Test worker-EventThread] - 2024-11-25 12:45:07.558 INFO  o.a.c.framework.imps.EnsembleTracker - New config event received: {}
2024-11-25T13:01:41.8073138Z     [Test worker-EventThread] - 2024-11-25 12:45:07.559 DEBUG o.a.c.framework.imps.EnsembleTracker - Ignoring new config as it is empty
2024-11-25T13:01:41.8073664Z     [Test worker] - 2024-11-25 12:45:07.559 INFO  p.a.t.h.i.z.ZookeeperGroupRepository - Before ensuring init path exists

------------------- WORKING-1----------------------
[Test worker-SendThread(localhost:32803)] - 2024-11-25 12:47:36.125 INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:32803.
2024-11-25T13:01:43.4907042Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 12:47:36.125 INFO  org.apache.zookeeper.ClientCnxn - SASL config status: Will not attempt to authenticate using SASL (unknown error)
2024-11-25T13:01:43.4908968Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 12:47:36.126 INFO  org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /127.0.0.1:44370, server: localhost/127.0.0.1:32803
2024-11-25T13:01:43.4910680Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 12:47:36.126 DEBUG org.apache.zookeeper.ClientCnxn - Session establishment request sent on localhost/127.0.0.1:32803
2024-11-25T13:01:43.4912471Z     [docker-java-stream-1547584714] - 2024-11-25 12:47:36.127 INFO  HermesZookeeper - STDOUT: [2024-11-25 12:47:36,126] DEBUG Accepted socket connection from /172.17.0.1:43992 (org.apache.zookeeper.server.NIOServerCnxnFactory)
2024-11-25T13:01:43.4914686Z     [docker-java-stream-1547584714] - 2024-11-25 12:47:36.127 INFO  HermesZookeeper - STDOUT: [2024-11-25 12:47:36,127] DEBUG Session establishment request from client /172.17.0.1:43992 client's lastZxid is 0x0 (org.apache.zookeeper.server.ZooKeeperServer)
2024-11-25T13:01:43.4916712Z     [docker-java-stream-1547584714] - 2024-11-25 12:47:36.127 INFO  HermesZookeeper - STDOUT: [2024-11-25 12:47:36,127] DEBUG Adding session 0x10000040b520004 (org.apache.zookeeper.server.SessionTrackerImpl)
2024-11-25T13:01:43.4918936Z     [docker-java-stream-1547584714] - 2024-11-25 12:47:36.127 INFO  HermesZookeeper - STDOUT: [2024-11-25 12:47:36,127] DEBUG Client attempting to establish new session: session = 0x10000040b520004, zxid = 0x0, timeout = 10000000, address = /172.17.0.1:43992 (org.apache.zookeeper.server.ZooKeeperServer)
2024-11-25T13:01:43.4921230Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 12:47:36.129 INFO  org.apache.zookeeper.ClientCnxn - Session establishment complete on server localhost/127.0.0.1:32803, session id = 0x10000040b520004, negotiated timeout = 60000
2024-11-25T13:01:43.4923692Z     [docker-java-stream-1547584714] - 2024-11-25 12:47:36.129 INFO  HermesZookeeper - STDOUT: [2024-11-25 12:47:36,127] DEBUG Processing request:: sessionid:0x10000040b520004 type:createSession cxid:0x0 zxid:0x375 txntype:-10 reqpath:n/a (org.apache.zookeeper.server.FinalRequestProcessor)
2024-11-25T13:01:43.4926131Z     [docker-java-stream-1547584714] - 2024-11-25 12:47:36.129 INFO  HermesZookeeper - STDOUT: [2024-11-25 12:47:36,128] DEBUG sessionid:0x10000040b520004 type:createSession cxid:0x0 zxid:0x375 txntype:-10 reqpath:n/a (org.apache.zookeeper.server.FinalRequestProcessor)
2024-11-25T13:01:43.4928489Z     [docker-java-stream-1547584714] - 2024-11-25 12:47:36.129 INFO  HermesZookeeper - STDOUT: [2024-11-25 12:47:36,128] DEBUG Established session 0x10000040b520004 with negotiated timeout 60000 for client /172.17.0.1:43992 (org.apache.zookeeper.server.ZooKeeperServer)
2024-11-25T13:01:43.4930309Z     [Test worker-EventThread] - 2024-11-25 12:47:36.129 DEBUG org.apache.curator.ConnectionState - Negotiated session timeout: 60000
2024-11-25T13:01:43.4931478Z     [Test worker-EventThread] - 2024-11-25 12:47:36.130 INFO  o.a.c.f.state.ConnectionStateManager - State change: CONNECTED
2024-11-25T13:01:43.4932623Z     [Test worker-EventThread] - 2024-11-25 12:47:36.130 DEBUG o.a.c.f.imps.CuratorFrameworkImpl - Clearing sleep for 0 operations
2024-11-25T13:01:43.4935826Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 12:47:36.130 DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x10000040b520004, packet:: clientPath:/zookeeper/config serverPath:/zookeeper/config finished:false header:: 1,4  replyHeader:: 1,885,0  request:: '/zookeeper/config,T  response:: ,s{0,0,0,0,0,0,-1,0,0,0,0}
2024-11-25T13:01:43.4938212Z     [docker-java-stream-1547584714] - 2024-11-25 12:47:36.134 INFO  HermesZookeeper - STDOUT: [2024-11-25 12:47:36,130] DEBUG Checking session 0x10000040b520004 (org.apache.zookeeper.server.SessionTrackerImpl)
2024-11-25T13:01:43.4940613Z     [docker-java-stream-1547584714] - 2024-11-25 12:47:36.134 INFO  HermesZookeeper - STDOUT: [2024-11-25 12:47:36,130] DEBUG Processing request:: sessionid:0x10000040b520004 type:getData cxid:0x1 zxid:0xfffffffffffffffe txntype:unknown reqpath:/zookeeper/config (org.apache.zookeeper.server.FinalRequestProcessor)
2024-11-25T13:01:43.4943271Z     [docker-java-stream-1547584714] - 2024-11-25 12:47:36.134 INFO  HermesZookeeper - STDOUT: [2024-11-25 12:47:36,130] DEBUG sessionid:0x10000040b520004 type:getData cxid:0x1 zxid:0xfffffffffffffffe txntype:unknown reqpath:/zookeeper/config (org.apache.zookeeper.server.FinalRequestProcessor)
2024-11-25T13:01:43.4945480Z     [docker-java-stream-1547584714] - 2024-11-25 12:47:36.134 INFO  HermesZookeeper - STDOUT: [2024-11-25 12:47:36,130] DEBUG Permission requested: 1  (org.apache.zookeeper.server.PrepRequestProcessor)
2024-11-25T13:01:43.4947062Z     [docker-java-stream-1547584714] - 2024-11-25 12:47:36.134 INFO  HermesZookeeper - STDOUT: [2024-11-25 12:47:36,130] DEBUG ACLs for node: [1,s{'world,'anyone}
2024-11-25T13:01:43.4948527Z     [docker-java-stream-1547584714] - 2024-11-25 12:47:36.134 INFO  HermesZookeeper - STDOUT: ] (org.apache.zookeeper.server.PrepRequestProcessor)
2024-11-25T13:01:43.4949897Z     [docker-java-stream-1547584714] - 2024-11-25 12:47:36.134 INFO  HermesZookeeper - STDOUT: [2024-11-25 12:47:36,130] DEBUG Client credentials: ['ip,'172.17.0.1
2024-11-25T13:01:43.4951244Z     [docker-java-stream-1547584714] - 2024-11-25 12:47:36.134 INFO  HermesZookeeper - STDOUT: ] (org.apache.zookeeper.server.PrepRequestProcessor)
2024-11-25T13:01:43.4956024Z     [Test worker-EventThread] - 2024-11-25 12:47:36.138 INFO  o.a.c.framework.imps.EnsembleTracker - New config event received: {}
2024-11-25T13:01:43.4957222Z     [Test worker-EventThread] - 2024-11-25 12:47:36.138 DEBUG o.a.c.framework.imps.EnsembleTracker - Ignoring new config as it is empty
2024-11-25T13:01:43.4959432Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 12:47:36.139 DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x10000040b520004, packet:: clientPath:null serverPath:null finished:false header:: 2,3  replyHeader:: 2,885,0  request:: '/hermes/groups,F  response:: s{4,4,1732537799554,1732537799554,0,226,0,0,10,0,884}
2024-11-25T13:01:43.4961457Z     [Test worker] - 2024-11-25 12:47:36.139 INFO  p.a.t.h.i.z.ZookeeperGroupRepository - After ensuring init path exists

------------------- WORKING-2----------------------
[Test worker] - 2024-11-25 21:44:34.282 INFO  org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=localhost:32803 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@32fcec97
2024-11-25T21:44:34.3734452Z     [Test worker] - 2024-11-25 21:44:34.282 INFO  o.apache.zookeeper.ClientCnxnSocket - jute.maxbuffer value is 1048575 Bytes
2024-11-25T21:44:34.3736155Z     [Test worker] - 2024-11-25 21:44:34.282 INFO  org.apache.zookeeper.ClientCnxn - zookeeper.request.timeout value is 0. feature enabled=false
2024-11-25T21:44:34.3738356Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 21:44:34.283 DEBUG o.a.zookeeper.SaslServerPrincipal - Canonicalized address to localhost
2024-11-25T21:44:34.3744509Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 21:44:34.283 INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/[0:0:0:0:0:0:0:1]:32803.
2024-11-25T21:44:34.3746625Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 21:44:34.283 INFO  org.apache.zookeeper.ClientCnxn - SASL config status: Will not attempt to authenticate using SASL (unknown error)
2024-11-25T21:44:34.3749259Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 21:44:34.284 INFO  org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /[0:0:0:0:0:0:0:1]:56954, server: localhost/[0:0:0:0:0:0:0:1]:32803
2024-11-25T21:44:34.3751711Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 21:44:34.284 DEBUG org.apache.zookeeper.ClientCnxn - Session establishment request sent on localhost/[0:0:0:0:0:0:0:1]:32803
2024-11-25T21:44:34.3754062Z     [docker-java-stream-1702742317] - 2024-11-25 21:44:34.287 INFO  HermesZookeeper - STDOUT: [2024-11-25 21:44:34,284] DEBUG Accepted socket connection from /172.17.0.1:38168 (org.apache.zookeeper.server.NIOServerCnxnFactory)
2024-11-25T21:44:34.3756972Z     [docker-java-stream-1702742317] - 2024-11-25 21:44:34.287 INFO  HermesZookeeper - STDOUT: [2024-11-25 21:44:34,285] DEBUG Session establishment request from client /172.17.0.1:38168 client's lastZxid is 0x0 (org.apache.zookeeper.server.ZooKeeperServer)
2024-11-25T21:44:34.3759591Z     [docker-java-stream-1702742317] - 2024-11-25 21:44:34.287 INFO  HermesZookeeper - STDOUT: [2024-11-25 21:44:34,285] DEBUG Adding session 0x1000004d7030001 (org.apache.zookeeper.server.SessionTrackerImpl)
2024-11-25T21:44:34.3762906Z     [docker-java-stream-1702742317] - 2024-11-25 21:44:34.287 INFO  HermesZookeeper - STDOUT: [2024-11-25 21:44:34,285] DEBUG Client attempting to establish new session: session = 0x1000004d7030001, zxid = 0x0, timeout = 10000, address = /172.17.0.1:38168 (org.apache.zookeeper.server.ZooKeeperServer)
2024-11-25T21:44:34.3766212Z     [docker-java-stream-1702742317] - 2024-11-25 21:44:34.287 INFO  HermesZookeeper - STDOUT: [2024-11-25 21:44:34,285] DEBUG Processing request:: sessionid:0x1000004d7030001 type:createSession cxid:0x0 zxid:0xb txntype:-10 reqpath:n/a (org.apache.zookeeper.server.FinalRequestProcessor)
2024-11-25T21:44:34.3769490Z     [docker-java-stream-1702742317] - 2024-11-25 21:44:34.287 INFO  HermesZookeeper - STDOUT: [2024-11-25 21:44:34,286] DEBUG sessionid:0x1000004d7030001 type:createSession cxid:0x0 zxid:0xb txntype:-10 reqpath:n/a (org.apache.zookeeper.server.FinalRequestProcessor)
2024-11-25T21:44:34.3772576Z     [docker-java-stream-1702742317] - 2024-11-25 21:44:34.287 INFO  HermesZookeeper - STDOUT: [2024-11-25 21:44:34,286] DEBUG Established session 0x1000004d7030001 with negotiated timeout 10000 for client /172.17.0.1:38168 (org.apache.zookeeper.server.ZooKeeperServer)
2024-11-25T21:44:34.3776024Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 21:44:34.289 INFO  org.apache.zookeeper.ClientCnxn - Session establishment complete on server localhost/[0:0:0:0:0:0:0:1]:32803, session id = 0x1000004d7030001, negotiated timeout = 10000
2024-11-25T21:44:34.3890189Z     [Test worker-EventThread] - 2024-11-25 21:44:34.289 DEBUG org.apache.curator.ConnectionState - Negotiated session timeout: 10000
2024-11-25T21:44:34.3891491Z     [Test worker-EventThread] - 2024-11-25 21:44:34.289 INFO  o.a.c.f.state.ConnectionStateManager - State change: CONNECTED
2024-11-25T21:44:34.3892678Z     [Test worker-EventThread] - 2024-11-25 21:44:34.290 DEBUG o.a.c.f.imps.CuratorFrameworkImpl - Clearing sleep for 0 operations
2024-11-25T21:44:34.3893725Z     [Test worker] - 2024-11-25 21:44:34.290 INFO  o.a.c.f.imps.CuratorFrameworkImpl - Default schema
2024-11-25T21:44:34.3895578Z     [docker-java-stream-1702742317] - 2024-11-25 21:44:34.295 INFO  HermesZookeeper - STDOUT: [2024-11-25 21:44:34,294] DEBUG Checking session 0x1000004d7030001 (org.apache.zookeeper.server.SessionTrackerImpl)
2024-11-25T21:44:34.3898360Z     [docker-java-stream-1702742317] - 2024-11-25 21:44:34.295 INFO  HermesZookeeper - STDOUT: [2024-11-25 21:44:34,294] DEBUG Processing request:: sessionid:0x1000004d7030001 type:getData cxid:0x1 zxid:0xfffffffffffffffe txntype:unknown reqpath:/zookeeper/config (org.apache.zookeeper.server.FinalRequestProcessor)
2024-11-25T21:44:34.3902094Z     [docker-java-stream-1702742317] - 2024-11-25 21:44:34.295 INFO  HermesZookeeper - STDOUT: [2024-11-25 21:44:34,294] DEBUG sessionid:0x1000004d7030001 type:getData cxid:0x1 zxid:0xfffffffffffffffe txntype:unknown reqpath:/zookeeper/config (org.apache.zookeeper.server.FinalRequestProcessor)
2024-11-25T21:44:34.3904819Z     [docker-java-stream-1702742317] - 2024-11-25 21:44:34.295 INFO  HermesZookeeper - STDOUT: [2024-11-25 21:44:34,294] DEBUG Permission requested: 1  (org.apache.zookeeper.server.PrepRequestProcessor)
2024-11-25T21:44:34.3906715Z     [docker-java-stream-1702742317] - 2024-11-25 21:44:34.295 INFO  HermesZookeeper - STDOUT: [2024-11-25 21:44:34,294] DEBUG ACLs for node: [1,s{'world,'anyone}
2024-11-25T21:44:34.3908345Z     [docker-java-stream-1702742317] - 2024-11-25 21:44:34.295 INFO  HermesZookeeper - STDOUT: ] (org.apache.zookeeper.server.PrepRequestProcessor)
2024-11-25T21:44:34.3909991Z     [docker-java-stream-1702742317] - 2024-11-25 21:44:34.295 INFO  HermesZookeeper - STDOUT: [2024-11-25 21:44:34,294] DEBUG Client credentials: ['ip,'172.17.0.1
2024-11-25T21:44:34.3911618Z     [docker-java-stream-1702742317] - 2024-11-25 21:44:34.295 INFO  HermesZookeeper - STDOUT: ] (org.apache.zookeeper.server.PrepRequestProcessor)
2024-11-25T21:44:34.3914582Z     [Test worker-SendThread(localhost:32803)] - 2024-11-25 21:44:34.295 DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x1000004d7030001, packet:: clientPath:/zookeeper/config serverPath:/zookeeper/config finished:false header:: 1,4  replyHeader:: 1,11,0  request:: '/zookeeper/config,T  response:: ,s{0,0,0,0,0,0,-1,0,0,0,0}
2024-11-25T21:44:34.3917138Z     [Test worker-EventThread] - 2024-11-25 21:44:34.296 INFO  o.a.c.framework.imps.EnsembleTracker - New config event received: {}
2024-11-25T21:44:34.3918506Z     [Test worker-EventThread] - 2024-11-25 21:44:34.297 DEBUG o.a.c.framework.imps.EnsembleTracker - Ignoring new config as it is empty
2024-11-25T21:44:34.3919847Z     [Test worker] - 2024-11-25 21:44:34.298 INFO  p.a.t.h.i.d.DefaultDatacenterNameProvider - Providing default datacenter name: dc
  1. Restarting zookeeper/all modules when any module should be restarted didn't help.
  2. Upgrading curator framework didn't help.

Questions for further analysis

  1. Why some of the connections don't cause logging Accepted socket connection from?
  2. How many connections there are on zookeeper container when the 404/pending subscriptions are visible? - we can run a command on a zookeeper container to check once we spot that problems.
  3. How it is possible that the clients receive sessionId from zookeeper server, but the server does not accept the connection (does not log the new connections logs on its side)? Zookeeper code

@faderskd faderskd added the bug label Nov 25, 2024
@faderskd faderskd force-pushed the fixing-ci-refactored branch from d1b00ba to 2435cb2 Compare November 25, 2024 16:36
@faderskd faderskd marked this pull request as ready for review November 27, 2024 09:58
@moscicky moscicky self-requested a review November 27, 2024 09:59
@faderskd faderskd force-pushed the fixing-ci-refactored branch from 2435cb2 to 3a12acb Compare December 4, 2024 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant