Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made ITs that restart MAC faster #5246

Open
wants to merge 2 commits into
base: 2.1
Choose a base branch
from

Conversation

dlmarion
Copy link
Contributor

Modified MAC so that it cleaned up lock paths
in ZooKeeper and ZooCache when stopping. Noticed
that in tests that restarted MAC the Manager
process would wait for the previous lock to be
removed on the session timeout. The lock paths
would also be cached in ZooCache and not updated
right away because the Watcher would not fire
when MAC was stopped, a ConnectionLoss error
would be returned when MAC started, and it would
take a while for ZooCache to fix itself.

Modified MAC so that it cleaned up lock paths
in ZooKeeper and ZooCache when stopping. Noticed
that in tests that restarted MAC the Manager
process would wait for the previous lock to be
removed on the session timeout. The lock paths
would also be cached in ZooCache and not updated
right away because the Watcher would not fire
when MAC was stopped, a ConnectionLoss error
would be returned when MAC started, and it would
take a while for ZooCache to fix itself.
@dlmarion dlmarion added this to the 2.1.4 milestone Jan 10, 2025
@dlmarion dlmarion requested a review from ctubbsii January 10, 2025 22:53
@dlmarion dlmarion self-assigned this Jan 10, 2025
@dlmarion
Copy link
Contributor Author

These changes shaved about 30s off of VolumeIT locally for me in 2.1. In main, with more tests, it shaved off about 50s.

if (siteConf.getBoolean(Property.INSTANCE_RPC_SASL_ENABLED)) {
SecurityUtil.serverLogin(siteConf);
}
zap(siteConf, args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the change in this file only a refactoring to put the code in a the try block in a method? Seems like no code changes were made besides that but was not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the diff ignoring whitespace changes makes this easier to see. It looks like the only thing that changed was that the jcommander option parsing got moved into the try block. Otherwise, I don't think there's any substantive change in this part.

Comment on lines +847 to +850
// When ZooKeeper was stopped in the previous method call,
// the local ZooKeeper watcher did not fire. If MAC is
// restarted, then ZooKeeper will start on the same port with
// the same data, but no Watchers will fire.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does MAC even have the ability to restart? I don't think that's a feature we have, is it? Regardless, I think watchers would fire with SessionExpired in this case, which the ZooCache watcher does handle correctly. I believe my ZooSession from #5192, which was just merged to the main branch should also handle this correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that a session expired KeeperException will be returned to the ZooKeeper client when the ZooKeeper server is re-started on the same port. But I don't think that watchers will fire when the first ZooKeeper server is terminated, or when the new ZooKeeper server is started. The code does eventually behave correctly, as the tests do complete successfully, but removing these paths from the ZooCache speed things up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants