Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use existing log for replication, even if purged from the PoV of openraft #1260

Closed
schreter opened this issue Oct 28, 2024 · 3 comments
Closed

Comments

@schreter
Copy link
Collaborator

schreter commented Oct 28, 2024

Currently, openraft instructs the log storage to purge the log after a snapshot. If a lagging replica reconnect afterwards, then this snapshot is sent instead of log.

However, it's not always desirable to do so. Say, the snapshot is big, but frequent. Then, we can keep more log to be able to recover the lagging replica significantly more efficiently. The state machine can decide how much log to actually keep to balance between log replay time (and less network traffic) and snapshot replication (potentially less work on the follower, but much more network traffic). In our project, we'd like to keep some percentage of the log (say, 20% of the snapshot) to be able to recover lagging replicas from the log.

There are two approaches I can think of how to use the still-existing log:

  • purge() call can return the log ID to which the log was actually purged
  • before calling purge(), ask the storage, whether there is a log ID which should be kept and adjust the purge position accordingly
  • when deciding whether to send a snapshot or log, we could first ask the log to deliver logs starting with the required log ID

The former two are more straightforward for the implementors of the storage, since there is a well-defined point at which to purge logs/decide how much log to keep. OTOH, there are some assertions in openraft which may become invalid when implementing the first solution.

The last requires the storage implementation to keep the log being streamed also in presence of later purge calls, which is problematic, so much more complex to implement correctly in the storage.

My preferred solution would be the second one.

Opinions?

BTW, regarding purge calls, if I understand it correctly, the snapshot & purge can currently happen while replicating log to the follower, potentially causing the log reader to fail, since the log was purged concurrently. Any take on this?

If it is so, then we need to implement some sort of postponement of log purge anyway. Probably the second solution is the simplest one - in this case, we have two sources of purge postponement - current replication state and the state from the storage.

Update I found the InFlight handling, so I suppose, this question is moot and it works as it should.

Copy link

👋 Thanks for opening this issue!

Get help or engage by:

  • /help : to print help messages.
  • /assignme : to assign this issue to you.

@schreter
Copy link
Collaborator Author

Looking at the code, the second solution is actually pretty simple to realize by updating LogHandler::calc_purge_upto(), since it already has provisions to keep absolute number of logs. The only question is how to bring a custom call into there.

@schreter
Copy link
Collaborator Author

Further investigation has shown that it's possible to basically turn off purging and purge on-demand by the application. Thus, I'm closing this for now and will try this approach first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant