-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3966 add_epoch_root and sync_l1 on Membership #3984
Conversation
.block_number(); | ||
|
||
// Skip if this is not the expected block. | ||
if task_state.epoch_height != 0 && (decided_block_number + 3) % task_state.epoch_height == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently what would happen if a node misses the block when the update is triggered? Would it never have add_epoch_root
called for that epoch?
I'm wondering if we the Membership trait should have a method like is_epoch_initialized(&self, epoch) -> bool
so that if the node is not online for the beginning of the epoch we can still initialize the membership for the epoch. But then we also have to dig out the correct header from history which I think might be cumbersome.
So maybe this should be handled with catchup in the confirmation layer? What do you think @imabdulbasit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think doing a catchup is better, but I am wondering how we can do that now with Hotshot managing the locks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For our existing catchup it's hotshot that triggers the catchup (for example by calling validate_and_apply_header
) and the confirmation layer that performs the catchup. Maybe it should be the same here: hotshot needs to call something to trigger catchup for the membership and the sequencer performs the catchup.
If we were to do it analoguous for membership we would have to make all the reading functions async and &mut self which seems terrible. I think we can think of something better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example hotshot could call add_epoch_root
(probably with another name) for every view then the sequencer could decide if it needs to do catchup for this epoch (and in most cases do (almost) nothing and return None
as write_callback). This way we ensure the catchup is actually triggered.
I think our overall design is a bit sub-optimal because it's an invariant that the membership is static for an epoch but that invariant isn't really reflected in the code. I think eventually we should enforce this in hotshot and the confirmation layer should somehow provide a constructor for a "EpochMembership" type that takes an epoch
as input that is called by hotshot where needed. @ss-es mentioned this would be difficult to do in hotshot but I'm not sure pushing the complexity elsewhere is actually better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah but then hotshot has to decide when to call it? I think it would calling it for every view would be too much maybe. It makes sense to do catch up when the membership get functions() do not return any data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we do catchup at the node startup and pass the Committee
after catchup so hotshot does not need to do anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think catchup at startup might not work if a node misses some views for some other reason, for example due to a temporary network outage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah but then hotshot has to decide when to call it? I think it would calling it for every view would be too much maybe.
If it doesn't do anything except for once per epoch then what's the problem with calling it every view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the Membership trait should have a method like
is_epoch_initialized(&self, epoch) -> bool
so that if the node is not online for the beginning of the epoch we can still initialize the membership for the epoch. But then we also have to dig out the correct header from history which I think might be cumbersome.
Why do we need this? When node comes online it should go through same routine as all nodes, which includes calling add_epoch_root
for the block being proposed/validated. Hotshot should ensure mechanisms are in place to concert the late joining node. I don't see the need for an extra mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be reading it wrongly but I think the current implementation in this PR only calls the function if decided_block_number + 3) % task_state.epoch_height == 0
so if it joins at a later point and the decided block has moved on it won't call it. Maybe I'm misunderstanding.
epoch_from_block_number(decided_block_number, task_state.epoch_height) + 1, | ||
); | ||
|
||
let membership_reader = task_state.membership.read().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly personal preference, but I think it is more idiomatic to prefer introducing a scoped block over explicit drop
s. Something like:
let write_callback = {
let membership_reader = task_state.membership.read().await;
membership_reader.add_epoch_root(next_epoch_number, proposal.block_header.clone())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
2cb5789
to
13a9ce4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some comments, though I'm not necessarily sure in any of them
crates/types/src/traits/election.rs
Outdated
fn uses_sync_l1() -> bool { | ||
false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this actually needed with the current signature for sync_l1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I feel like we wouldn't want to do this even if we could
I think we'd (eventually) always try to take this lock in deployment, so this would only help us in tests. but I don't think we want our tests to have different locking behavior
I might be misunderstanding though
crates/task-impls/src/helpers.rs
Outdated
if TYPES::Membership::uses_sync_l1() { | ||
let write_callback = { | ||
let membership_reader = membership.read().await; | ||
membership_reader.sync_l1().await | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than checking uses_sync_l1()
we can just check whether add_epoch_root
actually did something, and if so immediately apply the callback and the sync
7b49557
to
e6ab29f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good to me, though I think it's worth merging main in and using is_epoch_root
crates/task-impls/src/helpers.rs
Outdated
.block_number(); | ||
|
||
// Skip if this is not the expected block. | ||
if epoch_height != 0 && (decided_block_number + 3) % epoch_height == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3948 just added hotshot_types::utils::is_epoch_root
which I think we should use here too (I'm not 100% sure if we're going with the 3rd last block, but we should definitely be using the same thing as we do in the drb)
let write_callback = { | ||
let membership_reader = membership.read().await; | ||
membership_reader.sync_l1().await | ||
}; | ||
|
||
if let Some(write_callback) = write_callback { | ||
let mut membership_writer = membership.write().await; | ||
write_callback(&mut *membership_writer); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want this in a tokio::spawn
, so we don't block on the sync_l1
call but I'm not 100% sure what the right approach is here (e.g. I don't know if doing that adds deadlock risks). probably also fine to leave as-is and deal with it if problems come up
Closes #3966
This PR:
Adds add_epoch_root and sync_l1 to Membership trait, calling them from within quorum_vote.
This PR does not:
Key places to review: