Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assess number of zookeeper watches being set by different processes types. #5134

Open
keith-turner opened this issue Dec 4, 2024 · 4 comments
Labels
blocker This issue blocks any release version labeled on it.
Milestone

Comments

@keith-turner
Copy link
Contributor

Accumulo code is structured to minimize the number zookeeper connections and number of zookeeper watches. With the large number of changes made in 4.0 it is possible that these goals are not being met as well as they used to be. Developing a way to measure the number of connections and watches per process would help determine if what is happening with this goal. Would be nice to be able to see this information per server and per client. There may not be any accumulo changes needed for this, may be able to do this w/ existing zookeeper mechanisms and some scripts.

@keith-turner keith-turner added this to the 4.0.0 milestone Dec 4, 2024
@keith-turner
Copy link
Contributor Author

Accumulo will select random servers for some client operations. If there are lot of servers and client is making lots of API calls that select a random servers then what happens w/ ZK watches in that client over time? This is an example of a scenario where it would be nice to have data for clients in addition to servers.

@keith-turner keith-turner added the blocker This issue blocks any release version labeled on it. label Dec 6, 2024
@keith-turner
Copy link
Contributor Author

This zookeeper client has class for inspecting a clients watches. Maybe accumulo processes could be made to call that and dump their watches. Could also setup metrics in each process to report the number of watches it has.

https://zookeeper.apache.org/doc/current/apidocs/zookeeper-server/org/apache/zookeeper/server/watch/WatchManager.html

@keith-turner
Copy link
Contributor Author

Discovered WatchManager is not public API for zookeeper. So may not be able to get a count of watches for a zookeeper object.

@keith-turner
Copy link
Contributor Author

This comment proposed a potential way for zoocache to deal with entries read from zoocache only once. This may or may not actually be a problem. Need to collect data on how much this happens in a live Accumulo system and go from there. That data collection could possibly be done as part of this issue.

#5143 (comment)

dlmarion added a commit to dlmarion/accumulo that referenced this issue Dec 13, 2024
This commit removes most of the places where ZooCache
instances were being created in favor of re-using the
ZooCache from the ClientContext. Additionally, this
commit does not place a Watcher on each node that is
cached and instead places a single persistent
recursive Watcher at the paths in which the caching
is taking place.

This change roughly reduces the Watchers reported in
WatchTheWatchCountIT by 50%. While reducing the number
of Watchers, this commit could reduce ZooKeeper server
performance in two ways:

  1. There is a note in the ZooKeeper javadoc for the
     AddWatchMode enum that states there is a small
     performance decrease when using recursive watchers
     as all of the segments of ZNode paths need to be
     checked for watch triggering.

  2. Because a Watcher is not set on each node this
     commit modified the ZooCache.ZCacheWatcher to
     remove the parent of the triggered node, the
     triggered node, and all of its siblings from the
     cache. This overmatching may mean increased
     lookups in ZooKeeper.

Related to apache#5134
Closes apache#5154, apache#5157
dlmarion added a commit to dlmarion/accumulo that referenced this issue Jan 14, 2025
Reused ZooCache from client / server context where possible. Removed
overloaded ZooCache constructor to make it easier to find where new
instances are constructed. Removed watcher that was being placed in
each call to the underlying ZooKeeper in favor of long-lived persistent
recursive watchers set on specific paths which will fire when any
child under that path is modified.

Related to apache#5134
Closes apache#5154, apache#5157
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker This issue blocks any release version labeled on it.
Projects
None yet
Development

No branches or pull requests

1 participant