Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor snap-sync. #2916

Merged
merged 6 commits into from
Jul 23, 2024
Merged

Refactor snap-sync. #2916

merged 6 commits into from
Jul 23, 2024

Conversation

shamil-gadelshin
Copy link
Contributor

The PR contains changes related to the upcoming snap-sync for domains:

  • support for snap-sync on target block number
  • conditional block import from the downloaded segment

New get_blocks_from_target_segment calculates the target segment for download by the optional target block (None = last segment).

Code contributor checklist:

- add optional target block
- add conditional block import
Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe why this is needed and how it will be used? There is answer to "what?", but not "why?" right now.

Also it'd make review significantly easier if you can extract get_blocks_from_target_segment before adding features to it, otherwise there is code moving around that makes the diff appear much larger than it is and it no longer fits the screen, so a lot of jumping back and forth trying to understand what has actually changed, also I tend to do things like last_segment_index to target_segment_index renaming done in a separate commit as well to further reduce visual noise.

let signed_block = decode_block::<Block>(&block_bytes)
.map_err(|error| format!("Failed to decode archived block: {error}"))?;
let (header, extrinsics) = signed_block.block.deconstruct();
if import_blocks_from_downloaded_segment {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case of not doing this though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name it something like import_a_single_block or similar? Right now ti indicates we do not import blocks, but we do, we just import one instead of all.

crates/subspace-service/src/sync_from_dsn/snap_sync.rs Outdated Show resolved Hide resolved
@shamil-gadelshin
Copy link
Contributor Author

Here is the original snap-sync for domains @nazar-pc

Domain snap-sync

Prerequisites

  1. MMR sync (new protocol) to support XDM and fraud proofs - from aux storage of the consensus chain client.
  2. snap-sync changes:
    • "snap-sync from a custom block"
    • state-only snap-sync
    • support for multiple snap-sync runs
  3. Domains change:
    • enable block and state request protocols for domains and disable block sync for domains (pause_sync)
    • toggle consensus blocks notifications to pause/resume domain block derivation
    • add execution receipt of the last confirmed domain block and associated consensus block fields to LatestConfirmedDomainBlock

Algorithm

  1. Snap-sync consensus chain to the head having disabled the domain chain notification (state-only).
  2. Get last_confirmed_domains_block from the consensus chain runtime state and retrieve execution receipt of the last confirmed domain block and associated consensus block fields of the LatestConfirmedDomainBlock from consensus chain runtime state.
  3. MMR-sync (must be done before the final domain blocks sync/derivation).
  4. Snap-sync consensus chain from the associated consensus block:
    • After getting the state and before syncing/derivation domain chain blocks - download state and last_confirmed_domains_block from the domain network.
    • Sync blocks for consensus chain (from DSN - because we can't guarantee other peers don't prune their local blocks at the arbitrary height) with reenabling domain chain notification.

Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above comment provides additional context, which is certainly helpful, but it answers the question "How?", it still doesn't say "Why?" or "What is the problem that we're trying to solve?".
It helps a lot if you open a PR with a problem and solution description, such that reviewer can understand what the goal is.
This is not just for me (you assumed I know everything that I need to for review), but also for auditors and community members that may review this PR.

For example recent refactoring PR #2912 didn't introduce any features at all, but explained briefly why refactoring is done and what is coming after it for some context of the changes.

What I expected in this PR is a description that looks something like this (based on my understanding after our 1:1 discussion):

This is the first step towards implementing Snap sync for domains.

One of the initial requirements for it is that we need to find the last archived block that updated last confirmed block for a particular domain and resume sync from there.
Currently Snap sync assumes we always sync first block of the last segment, but with this new requirement we need to be able to potentially purposefully sync to an older block instead and this PR updates Snap sync API to do just that.

As you can see your comment contains instructions on "How?" we do that step by step, but doesn't actually explain why would we need to do that in the first place.
This is not the first time I'm asking "Why?" and I hope example helps now that I undestand what we're trying to achieve here.


On a separate topic I thought about implementation some more and I'm not sure we need to Snap sync/import the newer block before (without notification, etc.) before syncing older block.
IIRC there should be a way to download just a subset of the state with a proof.
So potentially after you download and decode the block, you can get a state root out of it and ask peers to give you the contents of a known key and a proof that it ties back to this storage root.
I'm not sure how difficult it would be to do that, but it would avoid some additional edge-cases like when one of the future blocks is already imported (after this initial snap sync) and there is no corresponding notification about its import anymore even though domain (or something on consensus side) may actually expect consistent notifications for each block without gaps and would break in interesting ways otherwise.

crates/subspace-service/src/sync_from_dsn/snap_sync.rs Outdated Show resolved Hide resolved
let mut current_segment_index = last_segment_index;

loop {
if segment_header.last_archived_block().number == target_block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is a correct check. The fact that you have a specific last archived block number doesn't mean it is contained fully in this segment. I think this segment_header.last_archived_block().number == target_block part should simply be removed, the actual correct check is done below with < target_block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check would be incorrect if we have blocks spanning more than two segments. Otherwise, we will download and import an extra segment. However, your requested change is consistent with the previous discussion about the theoretical big blocks. Please, let me know if you want to change it despite the extra segments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do want to change it. It will make no difference now and will allow us to potentially avoid very tricky bugs in the future. We do not have any inherent guarantees about block size here. Also it is not like we need to add code, we need to remove a condition that makes things easier to understand, not harder (in this particular case).

Copy link
Contributor Author

@shamil-gadelshin shamil-gadelshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above comment provides additional context, which is certainly helpful, but it answers the question "How?", it still doesn't say "Why?" or "What is the problem that we're trying to solve?".

Indeed, the original PR description misses this explanation. I provided the context after our offline discussion and I will add more information next time.

So potentially after you download and decode the block, you can get a state root out of it and ask peers to give you the contents of a known key and a proof that it ties back to this storage root.
I'm not sure how difficult it would be to do that,

It makes sense as an optimization but I'm also not sure about the implementation difficulty.

...but it would avoid some additional edge-cases like when one of the future blocks is already imported (after this initial snap sync) and there is no corresponding notification about its import anymore even though domain (or something on consensus side) may actually expect consistent notifications for each block without gaps and would break in interesting ways otherwise.

This edge case must be addressed anyway to implement the "domains snap-sync" (Prerequisites section 3.2)

let mut current_segment_index = last_segment_index;

loop {
if segment_header.last_archived_block().number == target_block
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check would be incorrect if we have blocks spanning more than two segments. Otherwise, we will download and import an extra segment. However, your requested change is consistent with the previous discussion about the theoretical big blocks. Please, let me know if you want to change it despite the extra segments.

let signed_block = decode_block::<Block>(&block_bytes)
.map_err(|error| format!("Failed to decode archived block: {error}"))?;
let (header, extrinsics) = signed_block.block.deconstruct();
if import_blocks_from_downloaded_segment {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline.

Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This edge case must be addressed anyway to implement the "domains snap-sync" (Prerequisites section 3.2)

What I meant is that with suggested way of implementing it we do not have that issue anymore because we avoid importing future block in the first place, so there will be nothing to address.

let mut current_segment_index = last_segment_index;

loop {
if segment_header.last_archived_block().number == target_block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do want to change it. It will make no difference now and will allow us to potentially avoid very tricky bugs in the future. We do not have any inherent guarantees about block size here. Also it is not like we need to add code, we need to remove a condition that makes things easier to understand, not harder (in this particular case).

let signed_block = decode_block::<Block>(&block_bytes)
.map_err(|error| format!("Failed to decode archived block: {error}"))?;
let (header, extrinsics) = signed_block.block.deconstruct();
if import_blocks_from_downloaded_segment {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name it something like import_a_single_block or similar? Right now ti indicates we do not import blocks, but we do, we just import one instead of all.

@shamil-gadelshin shamil-gadelshin marked this pull request as draft July 18, 2024 11:36
Comment on lines 245 to 247
/// Note: setting import_state_block_only will import only the block related to the state (a block
/// number equal or less than target_block or the first block of the last archived segment) and
/// disable importing the remaining blocks of the downloaded segment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought target_block is target block, but this comment is saying a lower block number may be imported instead, is that really true and if so then why not importing the block that is requested as one would expect?

Copy link
Contributor Author

@shamil-gadelshin shamil-gadelshin Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our current code supports the case of importing the first block of the segment with a state and possibly importing the remaining blocks later. Are you suggesting to implement: 1) import blocks before the block, 2) import the state block, 3) import the remaining blocks ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not suggesting anything, I'm asking about what was actually implemented because the comment doesn't match what I expected the code would do.

@shamil-gadelshin
Copy link
Contributor Author

The last commit removes the inconsistent feature of skipping block import using a special flag for snap-sync (following the offline discussion).

@shamil-gadelshin shamil-gadelshin marked this pull request as ready for review July 23, 2024 10:08
Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll squash-merge

@nazar-pc nazar-pc merged commit fb2311b into main Jul 23, 2024
11 checks passed
@nazar-pc nazar-pc deleted the snap-sync-refactoring branch July 23, 2024 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants