3966 add_epoch_root and sync_l1 on Membership #3984

pls148 · 2024-12-19T23:55:42Z

Closes #3966

This PR:

Adds add_epoch_root and sync_l1 to Membership trait, calling them from within quorum_vote.

This PR does not:

Key places to review:

crates/types/src/traits/election.rs

sveitser · 2024-12-20T09:12:24Z

crates/task-impls/src/quorum_vote/handlers.rs

+        .block_number();
+
+    // Skip if this is not the expected block.
+    if task_state.epoch_height != 0 && (decided_block_number + 3) % task_state.epoch_height == 0 {


Currently what would happen if a node misses the block when the update is triggered? Would it never have add_epoch_root called for that epoch?

I'm wondering if we the Membership trait should have a method like is_epoch_initialized(&self, epoch) -> bool so that if the node is not online for the beginning of the epoch we can still initialize the membership for the epoch. But then we also have to dig out the correct header from history which I think might be cumbersome.

So maybe this should be handled with catchup in the confirmation layer? What do you think @imabdulbasit ?

I think doing a catchup is better, but I am wondering how we can do that now with Hotshot managing the locks

For our existing catchup it's hotshot that triggers the catchup (for example by calling validate_and_apply_header) and the confirmation layer that performs the catchup. Maybe it should be the same here: hotshot needs to call something to trigger catchup for the membership and the sequencer performs the catchup.

If we were to do it analoguous for membership we would have to make all the reading functions async and &mut self which seems terrible. I think we can think of something better.

For example hotshot could call add_epoch_root (probably with another name) for every view then the sequencer could decide if it needs to do catchup for this epoch (and in most cases do (almost) nothing and return None as write_callback). This way we ensure the catchup is actually triggered.

I think our overall design is a bit sub-optimal because it's an invariant that the membership is static for an epoch but that invariant isn't really reflected in the code. I think eventually we should enforce this in hotshot and the confirmation layer should somehow provide a constructor for a "EpochMembership" type that takes an epoch as input that is called by hotshot where needed. @ss-es mentioned this would be difficult to do in hotshot but I'm not sure pushing the complexity elsewhere is actually better.

yeah but then hotshot has to decide when to call it? I think it would calling it for every view would be too much maybe. It makes sense to do catch up when the membership get functions() do not return any data?

maybe we do catchup at the node startup and pass the Committee after catchup so hotshot does not need to do anything?

I think catchup at startup might not work if a node misses some views for some other reason, for example due to a temporary network outage.

yeah but then hotshot has to decide when to call it? I think it would calling it for every view would be too much maybe.

If it doesn't do anything except for once per epoch then what's the problem with calling it every view?

I'm wondering if the Membership trait should have a method like is_epoch_initialized(&self, epoch) -> bool so that if the node is not online for the beginning of the epoch we can still initialize the membership for the epoch. But then we also have to dig out the correct header from history which I think might be cumbersome.

Why do we need this? When node comes online it should go through same routine as all nodes, which includes calling add_epoch_root for the block being proposed/validated. Hotshot should ensure mechanisms are in place to concert the late joining node. I don't see the need for an extra mechanism.

I might be reading it wrongly but I think the current implementation in this PR only calls the function if decided_block_number + 3) % task_state.epoch_height == 0 so if it joins at a later point and the decided block has moved on it won't call it. Maybe I'm misunderstanding.

tbro · 2024-12-20T16:00:05Z

crates/task-impls/src/quorum_vote/handlers.rs

+            epoch_from_block_number(decided_block_number, task_state.epoch_height) + 1,
+        );
+
+        let membership_reader = task_state.membership.read().await;


Possibly personal preference, but I think it is more idiomatic to prefer introducing a scoped block over explicit drops. Something like:

let write_callback = { let membership_reader = task_state.membership.read().await; membership_reader.add_epoch_root(next_epoch_number, proposal.block_header.clone()) }

ss-es

left some comments, though I'm not necessarily sure in any of them

ss-es · 2025-01-03T16:48:21Z

crates/types/src/traits/election.rs

+    fn uses_sync_l1() -> bool {
+        false
+    }


is this actually needed with the current signature for sync_l1?

actually, I feel like we wouldn't want to do this even if we could

I think we'd (eventually) always try to take this lock in deployment, so this would only help us in tests. but I don't think we want our tests to have different locking behavior

I might be misunderstanding though

crates/task-impls/src/helpers.rs

ss-es · 2025-01-03T16:56:10Z

crates/task-impls/src/helpers.rs

+        if TYPES::Membership::uses_sync_l1() {
+            let write_callback = {
+                let membership_reader = membership.read().await;
+                membership_reader.sync_l1().await
+            };


rather than checking uses_sync_l1() we can just check whether add_epoch_root actually did something, and if so immediately apply the callback and the sync

ss-es

overall looks good to me, though I think it's worth merging main in and using is_epoch_root

ss-es · 2025-01-03T19:54:26Z

crates/task-impls/src/helpers.rs

+        .block_number();
+
+    // Skip if this is not the expected block.
+    if epoch_height != 0 && (decided_block_number + 3) % epoch_height == 0 {


#3948 just added hotshot_types::utils::is_epoch_root which I think we should use here too (I'm not 100% sure if we're going with the 3rd last block, but we should definitely be using the same thing as we do in the drb)

ss-es · 2025-01-03T20:04:41Z

crates/task-impls/src/helpers.rs

+        let write_callback = {
+            let membership_reader = membership.read().await;
+            membership_reader.sync_l1().await
+        };
+
+        if let Some(write_callback) = write_callback {
+            let mut membership_writer = membership.write().await;
+            write_callback(&mut *membership_writer);
+        }
+    }


I think we might want this in a tokio::spawn, so we don't block on the sync_l1 call but I'm not 100% sure what the right approach is here (e.g. I don't know if doing that adds deadlock risks). probably also fine to leave as-is and deal with it if problems come up

pls148 requested a review from bfish713 as a code owner December 19, 2024 23:55

github-actions bot assigned jparr721 and shenkeyao Dec 19, 2024

sveitser reviewed Dec 20, 2024

View reviewed changes

crates/types/src/traits/election.rs Outdated Show resolved Hide resolved

sveitser requested review from lukaszrzasik, imabdulbasit, tbro and ss-es December 20, 2024 08:44

sveitser reviewed Dec 20, 2024

View reviewed changes

tbro reviewed Dec 20, 2024

View reviewed changes

sveitser mentioned this pull request Dec 20, 2024

bump version: 0.5.83 #3981

Draft

1 task

pls148 force-pushed the ps/3966B branch from 12945a1 to 426d478 Compare December 20, 2024 23:27

pls148 force-pushed the ps/3966B branch 3 times, most recently from 2cb5789 to 13a9ce4 Compare January 3, 2025 16:23

ss-es reviewed Jan 3, 2025

View reviewed changes

pls148 force-pushed the ps/3966B branch 3 times, most recently from 7b49557 to e6ab29f Compare January 3, 2025 19:53

ss-es approved these changes Jan 3, 2025

View reviewed changes

3966 add_epoch_root and sync_l1 on Membership

6da9a90

pls148 force-pushed the ps/3966B branch from e6ab29f to 6da9a90 Compare January 3, 2025 20:52

pls148 merged commit e8cd837 into main Jan 3, 2025
17 checks passed

pls148 deleted the ps/3966B branch January 3, 2025 21:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3966 add_epoch_root and sync_l1 on Membership #3984

3966 add_epoch_root and sync_l1 on Membership #3984

pls148 commented Dec 19, 2024

sveitser Dec 20, 2024 •

edited

Loading

imabdulbasit Dec 20, 2024

sveitser Dec 20, 2024

sveitser Dec 20, 2024

imabdulbasit Dec 20, 2024

imabdulbasit Dec 20, 2024 •

edited

Loading

sveitser Dec 20, 2024

sveitser Dec 20, 2024

tbro Dec 20, 2024 •

edited

Loading

sveitser Dec 20, 2024

tbro Dec 20, 2024

pls148 Dec 20, 2024

ss-es left a comment

ss-es Jan 3, 2025

ss-es Jan 3, 2025

ss-es Jan 3, 2025

ss-es left a comment

ss-es Jan 3, 2025

ss-es Jan 3, 2025

3966 add_epoch_root and sync_l1 on Membership #3984

3966 add_epoch_root and sync_l1 on Membership #3984

Conversation

pls148 commented Dec 19, 2024

This PR:

This PR does not:

Key places to review:

sveitser Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imabdulbasit Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tbro Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ss-es left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ss-es left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sveitser Dec 20, 2024 •

edited

Loading

imabdulbasit Dec 20, 2024 •

edited

Loading

tbro Dec 20, 2024 •

edited

Loading