-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move tracking of unsplittable tablets from manager memory to metadata table #4177
Milestone
Comments
keith-turner
added a commit
to keith-turner/accumulo
that referenced
this issue
Jan 20, 2024
1. Moves finding split points from tablet from a custom thread pool in the manager into FATE. This is done by the new Repo named FindSplits. 2. Stops tracking what splits are running in manager memory and moves this into FATE. This is not implemented yet, plan to do that in another PR 3. Stops tracking what tablets are unsplittable in manager memory. This should be tracked in metadata table per apache#4177. This change can be in its own commit. In this commit the in memory tracking is simply removed. These changes will reduce manager memory usage. These changes also will make splitting tablets work much better if FATE is distributed for two reasons. First it allows the computation to spread over multiple manager processes. Second it allows tracking of what is splitting to move from memory to persisted storage and not rely on a map in a single process. This is a WIP commit because it requires FATE changes. The needed FATE changes were stubbed out in this PR and implemented with a in memory set that would need to be persisted.
keith-turner
added a commit
that referenced
this issue
Feb 21, 2024
This commit makes the following changes 1. Moves finding split points from tablet from a custom thread pool in the manager into FATE. This is done by the new Repo named FindSplits. 2. Stops tracking what splits are running in manager memory and moves this into FATE. This is not implemented yet, plan to do that in another PR 3. Stops tracking what tablets are unsplittable in manager memory. This should be tracked in metadata table per #4177. This change can be in its own commit. In this commit the in memory tracking is simply removed. These changes will reduce manager memory usage. These changes also will make splitting tablets work much better if FATE is distributed for two reasons. First it allows the computation to spread over multiple manager processes. Second it allows tracking of what is splitting to move from memory to persisted storage and not rely on a map in a single process.
I have a prototype working for this issue but it still needs some polishing and some more testing, but I should be able to submit a PR soon. I'm also waiting to submit a PR because I based the branch for this PR off of #4254 to avoid merge conflicts since both PRs modify the Metadata schema and add new columns so once that PR is merged I can rebase off elasticity and then create a PR for this |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When a tablet is larger than the split threshold but a split point can not be found (this could happen if the tablet has a very large row with many columns) Accumulo tracks this. It is tracked to avoid repeatedly wasting resources on trying to find a split point for a tablet that can not split. In the elasticity branch, tracking of which tablets can not split is done in manager memory here,
This tracking could be moved to the metadata table by adding a new column to tablet metadata for tablets that are unsplittable. Tracking this information in the metadata table would reduce memory pressure on the manager and would make running splits in distributed fate possible. Its also nice because the information will survive manager restarts and not possibly cause a flurry of unneeded computation when the manager restarts.
This tracking could be moved to the metadata table by adding a new column to tablet metadata for tablets that are unsplittable. This column could record the following information.
The tablet management iterator is used to find tablet that need to split. This iterator could be modified to use this new column when determining if tablet needs to split. If the column exists it could check if the table config or tablet files has changed since the column was set.
The text was updated successfully, but these errors were encountered: