Refactor snap-sync. #2916

shamil-gadelshin · 2024-07-12T15:29:19Z

The PR contains changes related to the upcoming snap-sync for domains:

support for snap-sync on target block number
conditional block import from the downloaded segment

New get_blocks_from_target_segment calculates the target segment for download by the optional target block (None = last segment).

Code contributor checklist:

I have read, understood and followed contributing guide

- add optional target block - add conditional block import

nazar-pc

Can you describe why this is needed and how it will be used? There is answer to "what?", but not "why?" right now.

Also it'd make review significantly easier if you can extract get_blocks_from_target_segment before adding features to it, otherwise there is code moving around that makes the diff appear much larger than it is and it no longer fits the screen, so a lot of jumping back and forth trying to understand what has actually changed, also I tend to do things like last_segment_index to target_segment_index renaming done in a separate commit as well to further reduce visual noise.

crates/subspace-service/src/sync_from_dsn/snap_sync.rs

nazar-pc · 2024-07-15T05:02:56Z

crates/subspace-service/src/sync_from_dsn/snap_sync.rs

-        let signed_block = decode_block::<Block>(&block_bytes)
-            .map_err(|error| format!("Failed to decode archived block: {error}"))?;
-        let (header, extrinsics) = signed_block.block.deconstruct();
+    if import_blocks_from_downloaded_segment {


What is the use case of not doing this though?

Discussed offline.

Can we name it something like import_a_single_block or similar? Right now ti indicates we do not import blocks, but we do, we just import one instead of all.

crates/subspace-service/src/sync_from_dsn/snap_sync.rs

shamil-gadelshin · 2024-07-15T10:21:45Z

Here is the original snap-sync for domains @nazar-pc

Domain snap-sync

Prerequisites

MMR sync (new protocol) to support XDM and fraud proofs - from aux storage of the consensus chain client.
snap-sync changes:
- "snap-sync from a custom block"
- state-only snap-sync
- support for multiple snap-sync runs
Domains change:
- enable block and state request protocols for domains and disable block sync for domains (pause_sync)
- toggle consensus blocks notifications to pause/resume domain block derivation
- add execution receipt of the last confirmed domain block and associated consensus block fields to LatestConfirmedDomainBlock

Algorithm

Snap-sync consensus chain to the head having disabled the domain chain notification (state-only).
Get last_confirmed_domains_block from the consensus chain runtime state and retrieve execution receipt of the last confirmed domain block and associated consensus block fields of the LatestConfirmedDomainBlock from consensus chain runtime state.
MMR-sync (must be done before the final domain blocks sync/derivation).
Snap-sync consensus chain from the associated consensus block:
- After getting the state and before syncing/derivation domain chain blocks - download state and last_confirmed_domains_block from the domain network.
- Sync blocks for consensus chain (from DSN - because we can't guarantee other peers don't prune their local blocks at the arbitrary height) with reenabling domain chain notification.

Co-authored-by: Nazar Mokrynskyi <[email protected]>

nazar-pc

Above comment provides additional context, which is certainly helpful, but it answers the question "How?", it still doesn't say "Why?" or "What is the problem that we're trying to solve?".
It helps a lot if you open a PR with a problem and solution description, such that reviewer can understand what the goal is.
This is not just for me (you assumed I know everything that I need to for review), but also for auditors and community members that may review this PR.

For example recent refactoring PR #2912 didn't introduce any features at all, but explained briefly why refactoring is done and what is coming after it for some context of the changes.

What I expected in this PR is a description that looks something like this (based on my understanding after our 1:1 discussion):

This is the first step towards implementing Snap sync for domains.

One of the initial requirements for it is that we need to find the last archived block that updated last confirmed block for a particular domain and resume sync from there.
Currently Snap sync assumes we always sync first block of the last segment, but with this new requirement we need to be able to potentially purposefully sync to an older block instead and this PR updates Snap sync API to do just that.

As you can see your comment contains instructions on "How?" we do that step by step, but doesn't actually explain why would we need to do that in the first place.
This is not the first time I'm asking "Why?" and I hope example helps now that I undestand what we're trying to achieve here.

On a separate topic I thought about implementation some more and I'm not sure we need to Snap sync/import the newer block before (without notification, etc.) before syncing older block.
IIRC there should be a way to download just a subset of the state with a proof.
So potentially after you download and decode the block, you can get a state root out of it and ask peers to give you the contents of a known key and a proof that it ties back to this storage root.
I'm not sure how difficult it would be to do that, but it would avoid some additional edge-cases like when one of the future blocks is already imported (after this initial snap sync) and there is no corresponding notification about its import anymore even though domain (or something on consensus side) may actually expect consistent notifications for each block without gaps and would break in interesting ways otherwise.

crates/subspace-service/src/sync_from_dsn/snap_sync.rs

nazar-pc · 2024-07-15T23:31:15Z

crates/subspace-service/src/sync_from_dsn/snap_sync.rs

+            let mut current_segment_index = last_segment_index;
+
+            loop {
+                if segment_header.last_archived_block().number == target_block


I do not think this is a correct check. The fact that you have a specific last archived block number doesn't mean it is contained fully in this segment. I think this segment_header.last_archived_block().number == target_block part should simply be removed, the actual correct check is done below with < target_block.

The check would be incorrect if we have blocks spanning more than two segments. Otherwise, we will download and import an extra segment. However, your requested change is consistent with the previous discussion about the theoretical big blocks. Please, let me know if you want to change it despite the extra segments.

I do want to change it. It will make no difference now and will allow us to potentially avoid very tricky bugs in the future. We do not have any inherent guarantees about block size here. Also it is not like we need to add code, we need to remove a condition that makes things easier to understand, not harder (in this particular case).

shamil-gadelshin

Above comment provides additional context, which is certainly helpful, but it answers the question "How?", it still doesn't say "Why?" or "What is the problem that we're trying to solve?".

Indeed, the original PR description misses this explanation. I provided the context after our offline discussion and I will add more information next time.

So potentially after you download and decode the block, you can get a state root out of it and ask peers to give you the contents of a known key and a proof that it ties back to this storage root.
I'm not sure how difficult it would be to do that,

It makes sense as an optimization but I'm also not sure about the implementation difficulty.

...but it would avoid some additional edge-cases like when one of the future blocks is already imported (after this initial snap sync) and there is no corresponding notification about its import anymore even though domain (or something on consensus side) may actually expect consistent notifications for each block without gaps and would break in interesting ways otherwise.

This edge case must be addressed anyway to implement the "domains snap-sync" (Prerequisites section 3.2)

shamil-gadelshin · 2024-07-17T11:30:01Z

crates/subspace-service/src/sync_from_dsn/snap_sync.rs

+            let mut current_segment_index = last_segment_index;
+
+            loop {
+                if segment_header.last_archived_block().number == target_block


The check would be incorrect if we have blocks spanning more than two segments. Otherwise, we will download and import an extra segment. However, your requested change is consistent with the previous discussion about the theoretical big blocks. Please, let me know if you want to change it despite the extra segments.

crates/subspace-service/src/sync_from_dsn/snap_sync.rs

shamil-gadelshin · 2024-07-17T11:31:28Z

crates/subspace-service/src/sync_from_dsn/snap_sync.rs

-        let signed_block = decode_block::<Block>(&block_bytes)
-            .map_err(|error| format!("Failed to decode archived block: {error}"))?;
-        let (header, extrinsics) = signed_block.block.deconstruct();
+    if import_blocks_from_downloaded_segment {


Discussed offline.

nazar-pc

This edge case must be addressed anyway to implement the "domains snap-sync" (Prerequisites section 3.2)

What I meant is that with suggested way of implementing it we do not have that issue anymore because we avoid importing future block in the first place, so there will be nothing to address.

nazar-pc · 2024-07-18T08:22:47Z

crates/subspace-service/src/sync_from_dsn/snap_sync.rs

+            let mut current_segment_index = last_segment_index;
+
+            loop {
+                if segment_header.last_archived_block().number == target_block


I do want to change it. It will make no difference now and will allow us to potentially avoid very tricky bugs in the future. We do not have any inherent guarantees about block size here. Also it is not like we need to add code, we need to remove a condition that makes things easier to understand, not harder (in this particular case).

nazar-pc · 2024-07-18T08:24:27Z

crates/subspace-service/src/sync_from_dsn/snap_sync.rs

-        let signed_block = decode_block::<Block>(&block_bytes)
-            .map_err(|error| format!("Failed to decode archived block: {error}"))?;
-        let (header, extrinsics) = signed_block.block.deconstruct();
+    if import_blocks_from_downloaded_segment {


Can we name it something like import_a_single_block or similar? Right now ti indicates we do not import blocks, but we do, we just import one instead of all.

nazar-pc · 2024-07-18T11:38:12Z

crates/subspace-service/src/sync_from_dsn/snap_sync.rs

+/// Note: setting import_state_block_only will import only the block related to the state (a block
+/// number equal or less than target_block or the first block of the last archived segment) and
+/// disable importing the remaining blocks of the downloaded segment.


I thought target_block is target block, but this comment is saying a lower block number may be imported instead, is that really true and if so then why not importing the block that is requested as one would expect?

Our current code supports the case of importing the first block of the segment with a state and possibly importing the remaining blocks later. Are you suggesting to implement: 1) import blocks before the block, 2) import the state block, 3) import the remaining blocks ?

I'm not suggesting anything, I'm asking about what was actually implemented because the comment doesn't match what I expected the code would do.

shamil-gadelshin · 2024-07-23T10:07:58Z

The last commit removes the inconsistent feature of skipping block import using a special flag for snap-sync (following the offline discussion).

nazar-pc

I'll squash-merge

Refactor snap-sync.

ab5105f

- add optional target block - add conditional block import

shamil-gadelshin requested review from nazar-pc and rg3l3dr as code owners July 12, 2024 15:29

nazar-pc reviewed Jul 15, 2024

View reviewed changes

Update crates/subspace-service/src/sync_from_dsn/snap_sync.rs

0dd42d0

Co-authored-by: Nazar Mokrynskyi <[email protected]>

nazar-pc reviewed Jul 15, 2024

View reviewed changes

Change condition for target_block (snap-sync).

00d4f91

shamil-gadelshin commented Jul 17, 2024

View reviewed changes

shamil-gadelshin mentioned this pull request Jul 17, 2024

Optimize archiving during sync #2927

Merged

1 task

nazar-pc reviewed Jul 18, 2024

View reviewed changes

shamil-gadelshin marked this pull request as draft July 18, 2024 11:36

nazar-pc reviewed Jul 18, 2024

View reviewed changes

Refactor snap-sync.

43d991e

shamil-gadelshin force-pushed the snap-sync-refactoring branch from 5007ab7 to 43d991e Compare July 18, 2024 11:38

shamil-gadelshin added 2 commits July 23, 2024 14:05

Remove import_state_block_only variable

57c7ef6

Merge branch 'main' into snap-sync-refactoring

036c804

shamil-gadelshin marked this pull request as ready for review July 23, 2024 10:08

shamil-gadelshin requested a review from nazar-pc July 23, 2024 10:09

nazar-pc approved these changes Jul 23, 2024

View reviewed changes

nazar-pc merged commit fb2311b into main Jul 23, 2024
11 checks passed

nazar-pc deleted the snap-sync-refactoring branch July 23, 2024 12:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor snap-sync. #2916

Refactor snap-sync. #2916

shamil-gadelshin commented Jul 12, 2024

nazar-pc left a comment

nazar-pc Jul 15, 2024

shamil-gadelshin Jul 17, 2024

nazar-pc Jul 18, 2024

shamil-gadelshin commented Jul 15, 2024

nazar-pc left a comment

nazar-pc Jul 15, 2024

shamil-gadelshin Jul 17, 2024

nazar-pc Jul 18, 2024

shamil-gadelshin left a comment

shamil-gadelshin Jul 17, 2024

shamil-gadelshin Jul 17, 2024

nazar-pc left a comment

nazar-pc Jul 18, 2024

nazar-pc Jul 18, 2024

nazar-pc Jul 18, 2024

shamil-gadelshin Jul 18, 2024 •

edited

Loading

nazar-pc Jul 18, 2024

shamil-gadelshin commented Jul 23, 2024

nazar-pc left a comment

Refactor snap-sync. #2916

Refactor snap-sync. #2916

Conversation

shamil-gadelshin commented Jul 12, 2024

Code contributor checklist:

nazar-pc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shamil-gadelshin commented Jul 15, 2024

Domain snap-sync

Prerequisites

Algorithm

nazar-pc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shamil-gadelshin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nazar-pc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shamil-gadelshin Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shamil-gadelshin commented Jul 23, 2024

nazar-pc left a comment

Choose a reason for hiding this comment

shamil-gadelshin Jul 18, 2024 •

edited

Loading