-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix interrupt transaction race condition #1107
base: main
Are you sure you want to change the base?
Conversation
…er node that is not producing for trx execution
…e an enum for timer state instead of a bool expired. This allows differentiation between timer expired and interruption.
@@ -17,7 +15,8 @@ struct platform_timer { | |||
|
|||
void start(fc::time_point tp); | |||
void stop(); | |||
void expire_now(); | |||
void interrupt_timer(); | |||
void _expire_now(); // called by internal timer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
called by internal timer
Can be made private then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not without some friends and with the multiple impls that becomes rather a pain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the impls are the same platform_timer
struct though, not separate classes. And even _state
is already private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm a bit surprised that worked.
auto end = std::chrono::high_resolution_clock::now(); | ||
int timer_slop = std::chrono::duration_cast<std::chrono::microseconds>(end-start).count() - interval; | ||
timer.stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not seeing why this added line is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it because I originally added an assert that start() only called when timer is stopped. However, we don't actually honor that constraint currently elsewhere. Seems like it is better to honor that constraint normally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided it best to fix the invariant that stop()
always called before start()
. Surprise-surprise the issue there is with deferred-transactions.
call_expiration_callback(); | ||
} | ||
} | ||
|
||
void platform_timer::stop() { | ||
if(expired) | ||
if(_state == state_t::stopped) | ||
return; | ||
|
||
my->timer->cancel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to call my->timer->cancel();
only if (_state == state_t::running)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now needs to set to stopped, so the assert on start() is not triggered.
state_t expected = state_t::running; | ||
if (_state.compare_exchange_strong(expected, state_t::timed_out)) { | ||
call_expiration_callback(); | ||
} | ||
} | ||
|
||
void platform_timer::interrupt_timer() { | ||
state_t expected = state_t::running; | ||
if (_state.compare_exchange_strong(expected, state_t::interrupted)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't these call my->timer->cancel()
?
Isn't it possible that someone calls interrupt_timer()
(which updates the state but doesn't cancel the async_wait
), and then calls start(fc::time_point::maximum())
, and may be surprised when the previous async_wait
expires the timer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to add a mutex around the timer to do that. I think the intention is that stop() should be called before calling start() again.
Note:start |
…tion_context to call stop() on undo/squash
The support for interrupting transaction on new best head, #1047, has a race condition such that an interrupt could interrupt a speculative transaction instead of a transaction in apply block. This would cause the transaction to fail and not be re-tried on the node.
This PR refactors the
platform_timer
to keep track if it was trigger by an explicit interrupt or by the timer timing out. This allows fortransaction_context::checktime
to directly tell if a transaction was interrupted and raise an appropriate interrupt exception. If a transaction is interrupted it can then be retried similar to how a transaction is retried if it hits a block boundary.Resolves #1095