Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sozo): auto chunk blocks fetching world data #2929

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

okhaimie-dev
Copy link

@okhaimie-dev okhaimie-dev commented Jan 20, 2025

Description

Related issue

#2825

Tests

  • Yes
  • No, because they aren't needed
  • No, because I need help

Added to documentation?

  • README.md
  • Dojo Book
  • No documentation needed

Checklist

  • I've formatted my code (scripts/prettier.sh, scripts/rust_fmt.sh, scripts/cairo_fmt.sh)
  • I've linted my code (scripts/clippy.sh, scripts/docs.sh)
  • I've commented my code
  • I've requested a review after addressing the comments

Summary by CodeRabbit

  • New Features

    • Improved event retrieval process with support for large block ranges.
    • Added chunked event processing to enhance performance and reliability.
  • Performance

    • Optimized event retrieval by implementing block range segmentation.
    • Introduced a maximum block range limit to prevent potential retrieval issues.

Copy link

coderabbitai bot commented Jan 20, 2025

Walkthrough

Ohayo, sensei! The changes in the events command focus on enhancing event retrieval efficiency by introducing a maximum block range of 50,000 blocks. The new implementation processes events in chunks, ensuring more controlled and predictable fetching. Instead of handling events in a single batch, the code now breaks down the retrieval into manageable ranges, adjusting the from_block and to_block parameters based on the latest block number from the provider.

Changes

File Change Summary
bin/sozo/src/commands/events.rs - Added MAX_BLOCK_RANGE constant set to 50,000
- Implemented chunked event retrieval logic
- Modified block range handling to ensure efficient event processing

Sequence Diagram

sequenceDiagram
    participant User
    participant EventCommand
    participant Provider
    
    User->>EventCommand: Request events
    EventCommand->>Provider: Get latest block number
    Provider-->>EventCommand: Return latest block number
    EventCommand->>EventCommand: Adjust from_block and to_block
    loop Block Range Chunks
        EventCommand->>Provider: Retrieve events for chunk
        Provider-->>EventCommand: Return events
        EventCommand->>User: Display events and continuation token
    end
Loading

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77bee69 and 70532e8.

📒 Files selected for processing (1)
  • bin/sozo/src/commands/events.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • bin/sozo/src/commands/events.rs

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
bin/sozo/src/commands/events.rs (3)

20-20: Consider making MAX_BLOCK_RANGE configurable

Defining MAX_BLOCK_RANGE as a constant is fine, but making it a configurable parameter would enhance flexibility. This allows users to adjust the block range without modifying the code, improving usability in different scenarios.


68-73: Simplify from_block assignment

The nested match statements for assigning from_block can be simplified since BlockId::Number(block) will always match BlockId::Number(num). Here's a streamlined version:

-let from_block = match self.from_block {
-    Some(block) => match BlockId::Number(block) {
-        BlockId::Number(num) => num,
-        _ => latest_block.saturating_sub(1000),
-    },
-    None => latest_block.saturating_sub(1000),
-};
+let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000));

75-80: Simplify to_block assignment

Similarly, the assignment of to_block can be simplified for clarity:

-let to_block = match self.to_block {
-    Some(block) => match BlockId::Number(block) {
-        BlockId::Number(num) => num,
-        _ => latest_block,
-    },
-    None => latest_block,
-};
+let to_block = self.to_block.unwrap_or(latest_block);
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2dd3ead and 5074ac5.

📒 Files selected for processing (1)
  • bin/sozo/src/commands/events.rs (2 hunks)
🔇 Additional comments (1)
bin/sozo/src/commands/events.rs (1)

85-131: Verify block chunking logic for off-by-one errors

Please verify that the calculation of chunk_end and the update of current_start correctly process all blocks without overlap or gaps. Specifically, ensure that all blocks from from_block to to_block are processed exactly once.

Here's a script to check the block ranges:

✅ Verification successful

Ohayo! Block chunking logic is implemented correctly sensei!

The verification confirms that:

  • All blocks are processed exactly once
  • No gaps or overlaps exist between chunks
  • Edge cases, including the final block, are handled properly
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for overlaps or gaps in block ranges.

START_BLOCK=1000
END_BLOCK=2000
MAX_BLOCK_RANGE=500

current_start=$START_BLOCK

while [ $current_start -le $END_BLOCK ]; do
  chunk_end=$((current_start + MAX_BLOCK_RANGE - 1))
  if [ $chunk_end -gt $END_BLOCK ]; then
    chunk_end=$END_BLOCK
  fi
  echo "Processing blocks from $current_start to $chunk_end"
  current_start=$((chunk_end + 1))
done

Length of output: 620

Comment on lines +126 to +131
if let Some(continuation_token) = res.continuation_token {
println!("Continuation token: {:?}", continuation_token);
println!("----------------------------------------------");
}

current_start = chunk_end + 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle continuation_token for paginated event retrieval

When retrieving events, if a continuation_token is returned, it indicates more events are available within the current block range. Currently, the code prints the continuation_token but doesn't use it to fetch additional events. To ensure all events are retrieved, loop until res.continuation_token is None, using the token in subsequent get_events calls.

Would you like me to provide a revised implementation that correctly handles the continuation_token?

@glihm glihm changed the title feat ( sozo ): removed world block feat(sozo): removed world block Jan 21, 2025
@glihm glihm changed the title feat(sozo): removed world block feat(sozo): auto chunk blocks fetching world data Jan 21, 2025
Copy link
Collaborator

@glihm glihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okhaimie-dev thanks for the work here. Also, I'll update the issue, since we have other places where the world_block is actually used.

@@ -17,6 +17,8 @@ use super::options::starknet::StarknetOptions;
use super::options::world::WorldOptions;
use crate::utils;

const MAX_BLOCK_RANGE: u64 = 50_000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment mentioning that it's an arbitrary value for now that seems to work (and we need to validate that too).

} else {
self.from_block.map(BlockId::Number)
};
// Get latest block number
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Get latest block number

Don't use comment when it's self explanatory. 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright

Comment on lines 67 to 71
let from_block = match self.from_block {
Some(block) => match BlockId::Number(block) {
BlockId::Number(num) => num,
_ => latest_block.saturating_sub(1000),
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep using the world_block from the config, since we still want to start at a later block to avoid indexing the whole blockchain from block 0.

BlockId::Number(num) => num,
_ => latest_block.saturating_sub(1000),
},
None => latest_block.saturating_sub(1000),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to have a reasonable default window to start fetching event when the event is used. And since you recommend to remove the world_block I figured starting from around 24hr from the latest block would be reasonable.

What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, so if there's a world block, we take it. If not, you would start 1000 blocks before the latest?

In this case, it may be better starting to 0 actually. Like so, the default behavior is something easy to grasp for user. WDYT?

Copy link
Author

@okhaimie-dev okhaimie-dev Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will be better to start from block 0 because of how long it will take if they are fetching events from mainnet or sepolia, but I see what you are saying. Maybe it will be worth to leave a note in the sozo event doc to let user know the default behavior of starting to fetch from blocks produced 24hrs prior if a from_block is not provided

#[derive(Debug, Args)]
pub struct EventsArgs {
    #[arg(help = "List of specific events to be filtered")]
    #[arg(value_delimiter = ',')]
    pub events: Option<Vec<String>>,

    #[arg(short, long)]
    #[arg(help = "Block number from where to look for events")]
    pub from_block: Option<u64>,

    #[arg(short, long)]
    #[arg(help = "Block number until where to look for events")]
    pub to_block: Option<u64>,

If not, I will go with your recommendation and switch it 0 instead.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
bin/sozo/src/commands/events.rs (1)

110-114: ⚠️ Potential issue

Handle continuation token for complete event retrieval.

The code prints the continuation token but doesn't use it to fetch remaining events within the current block range.

For complete event retrieval, we should process all events within each chunk before moving to the next block range. Would you like me to provide an implementation that correctly handles the continuation token?

🧹 Nitpick comments (1)
bin/sozo/src/commands/events.rs (1)

82-85: Ohayo sensei! Add retry mechanism for RPC calls.

The event fetching could fail due to transient network issues. Consider implementing a retry mechanism with exponential backoff for better reliability.

Here's a suggested implementation using tokio-retry:

+use tokio_retry::strategy::{ExponentialBackoff, jitter};
+use tokio_retry::Retry;
+
+            let retry_strategy = ExponentialBackoff::from_millis(100)
+                .map(jitter)
+                .take(3);
+
-                let res = provider
+                let res = Retry::spawn(retry_strategy, || provider
                     .get_events(event_filter, self.continuation_token.clone(), self.chunk_size)
-                    .await?;
+                    .boxed())
+                    .await?;
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 050c2ee and 77bee69.

📒 Files selected for processing (1)
  • bin/sozo/src/commands/events.rs (2 hunks)
🔇 Additional comments (1)
bin/sozo/src/commands/events.rs (1)

19-21: Ohayo sensei! Let's validate the MAX_BLOCK_RANGE through benchmarking.

The constant value of 50,000 blocks needs validation through benchmarking to ensure optimal performance across different network conditions and load scenarios.

Run this script to analyze historical event fetching performance:

✅ Verification successful

Ohayo sensei! The MAX_BLOCK_RANGE value of 50,000 is reasonable as a default.

After analyzing similar implementations in Katana RPC and Saya, this value appears to be a good balance between performance and node load. The chunking mechanism allows for efficient pagination while preventing overwhelming the node.

  • Consider making this configurable via CLI or config file for different deployment scenarios, similar to Saya's approach
  • Keep the "pending benchmarking" comment as it correctly indicates this value might need adjustment based on specific network conditions
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Analyze event fetching performance with different block ranges
# Test ranges: 10k, 25k, 50k, 75k, 100k blocks

for range in 10000 25000 50000 75000 100000; do
    echo "Testing block range: $range"
    time cargo run --bin sozo events \
        --from-block $(($(cat .block_number) - range)) \
        --to-block $(cat .block_number) 2>&1 | \
        grep -E "real|Continuation token"
    echo "----------------------------------------"
done

Length of output: 2357


Script:

#!/bin/bash
# Search for benchmark files and similar constants
echo "=== Searching for benchmark files ==="
fd bench
echo -e "\n=== Searching for similar block range constants ==="
rg "BLOCK_RANGE|block_range" -A 2
echo -e "\n=== Looking for event-related test files ==="
fd "test.*events"

Length of output: 5128


Script:

#!/bin/bash
# Check contents of benchmark files
echo "=== Checking sozo benchmarks ==="
cat bin/sozo/benches/* 2>/dev/null || echo "No benchmark files found"
echo -e "\n=== Checking if events.rs has tests ==="
find bin/sozo/src -type f -name "*test*.rs" -exec grep -l "events" {} \;

Length of output: 1881

Comment on lines +65 to +68
let latest_block = provider.block_number().await?;
let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000));
let to_block = self.to_block.unwrap_or(latest_block);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider world_block configuration for initial block range.

Based on previous discussions, we should consider using world_block from the configuration when available, before falling back to the 1000-block window. This ensures consistency with the world configuration while maintaining the reasonable default window.

Apply this diff to incorporate world_block:

-            let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000));
+            let from_block = self.from_block.unwrap_or_else(|| {
+                world_diff.world_info
+                    .world_block
+                    .unwrap_or_else(|| latest_block.saturating_sub(1000))
+            });

Committable suggestion skipped: line range outside the PR's diff.

Copy link

codecov bot commented Jan 22, 2025

Codecov Report

Attention: Patch coverage is 0% with 42 lines in your changes missing coverage. Please review.

Project coverage is 55.74%. Comparing base (8f54b66) to head (77bee69).

Files with missing lines Patch % Lines
bin/sozo/src/commands/events.rs 0.00% 42 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2929      +/-   ##
==========================================
- Coverage   55.75%   55.74%   -0.01%     
==========================================
  Files         445      445              
  Lines       57627    57629       +2     
==========================================
- Hits        32129    32126       -3     
- Misses      25498    25503       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bin/sozo/src/commands/events.rs Outdated Show resolved Hide resolved
keys,
};
let latest_block = provider.block_number().await?;
let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As edited in the issue, please consider using the world_block from the config, if it exists. 👍

Comment on lines +110 to +115
if let Some(continuation_token) = res.continuation_token {
println!("Continuation token: {:?}", continuation_token);
println!("----------------------------------------------");
}

current_start = chunk_end + 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't just continue here, since the continuation token will indicate that there are more events to process. However in the new logic, since we're not stopping, those events would be lost.

The logic here could change to automatically fetch all the pages (as we do in the world_from_events function), and then display all the events.
Doing so, the continuation token is no more required to be exposed to the user. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants