-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in I/O sequence handling in initialization #1252
Comments
👋 Thanks for opening this issue! Get help or engage by:
|
It appears that the The IO order is crucial:It's crucial to maintain the correct order of write operations to ensure system correctness. As stated in the Openraft documentation: openraft/openraft/src/storage/v2/raft_log_storage.rs Lines 22 to 24 in e54a1fd
Furthermore, the completion of write methods (such as IO events in the log:The
This log sequence suggests that the |
The problem is that during initialization, the log append is initiated and completes asynchronously and vote is awaited inline. While awaiting the vote, the callback completes, but it seems to be too late - it wakes up the channel, which ultimately receives the completion, but you can't count on it when the task will continue to run. FYI: Flushing the log (and thus sending the I/O completion event) before writing vote didn't help. Finally, we added an explicit await for I/O of log entry 0 into log append to work around the problem. |
The callback for
Not only flushing the log first, but the callback for |
In our project, log I/O and vote I/O (and some others) are decoupled. I was under the impression that vote I/O is independent, so it was my mistake.
Our Therefore, I think that our workaround of awaiting log I/O before writing the vote is actually the solution (and I added I need to test whether it's sufficient also for log entry 0.0, which is scheduled somewhat differently in the |
If all I/O operations submitted to
The IO operations during |
You are right. I misread the code. What I meant is that So the bug is then indeed only on our side. I'm closing this issue and will document it on our side so the next colleague won't run into the problem again. Thanks for the help and sorry for the misunderstanding. |
We observe the following situation in our project:
AppendInputEntries
with membership configuration to the engine, with log ID 0.0 and vote 0.NoneAppendInputEntries
is executed, I/O state is so far unchangedSaveVote
is executed, afterwards I/O state is T1-N3 flushed & acceptedAppendInputEntries
completes (asynchronously) and tries to set I/O state to flushed T0-NNone, which panicsIt seems like the I/O sequence handling is buggy in presence of vote storing, which is executed inline during
AppendEntries
is still running. Probably the same will happen also later in similar situation upon I/O hiccup (this was at startup).I attached a minimized trace from
openraft
showing the issue: log.txt.The question is, do we do something wrong, or is the I/O sequence handling in
openraft
broken for asynchronously-completing calls? If the latter, how to fix it?It is
openraft
atmaster~1
.Thanks.
The text was updated successfully, but these errors were encountered: