feat(sozo): auto chunk blocks fetching world data #2929

okhaimie-dev · 2025-01-20T01:26:34Z

Description

Related issue

#2825

Tests

Yes
No, because they aren't needed
No, because I need help

Added to documentation?

README.md
Dojo Book
No documentation needed

Checklist

I've formatted my code (scripts/prettier.sh, scripts/rust_fmt.sh, scripts/cairo_fmt.sh)
I've linted my code (scripts/clippy.sh, scripts/docs.sh)
I've commented my code
I've requested a review after addressing the comments

Summary by CodeRabbit

New Features
- Improved event retrieval process with support for large block ranges.
- Added chunked event processing to enhance performance and reliability.
Performance
- Optimized event retrieval by implementing block range segmentation.
- Introduced a maximum block range limit to prevent potential retrieval issues.

coderabbitai · 2025-01-20T01:28:39Z

Walkthrough

Ohayo, sensei! The changes in the events command focus on enhancing event retrieval efficiency by introducing a maximum block range of 50,000 blocks. The new implementation processes events in chunks, ensuring more controlled and predictable fetching. Instead of handling events in a single batch, the code now breaks down the retrieval into manageable ranges, adjusting the from_block and to_block parameters based on the latest block number from the provider.

Changes

File	Change Summary
`bin/sozo/src/commands/events.rs`	- Added `MAX_BLOCK_RANGE` constant set to 50,000 - Implemented chunked event retrieval logic - Modified block range handling to ensure efficient event processing

Sequence Diagram

sequenceDiagram
    participant User
    participant EventCommand
    participant Provider
    
    User->>EventCommand: Request events
    EventCommand->>Provider: Get latest block number
    Provider-->>EventCommand: Return latest block number
    EventCommand->>EventCommand: Adjust from_block and to_block
    loop Block Range Chunks
        EventCommand->>Provider: Retrieve events for chunk
        Provider-->>EventCommand: Return events
        EventCommand->>User: Display events and continuation token
    end

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77bee69 and 70532e8.

📒 Files selected for processing (1)

bin/sozo/src/commands/events.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

bin/sozo/src/commands/events.rs

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

bin/sozo/src/commands/events.rs (3)
20-20: Consider making MAX_BLOCK_RANGE configurable

Defining MAX_BLOCK_RANGE as a constant is fine, but making it a configurable parameter would enhance flexibility. This allows users to adjust the block range without modifying the code, improving usability in different scenarios.

68-73: Simplify from_block assignment

The nested match statements for assigning from_block can be simplified since BlockId::Number(block) will always match BlockId::Number(num). Here's a streamlined version:
-let from_block = match self.from_block {
-    Some(block) => match BlockId::Number(block) {
-        BlockId::Number(num) => num,
-        _ => latest_block.saturating_sub(1000),
-    },
-    None => latest_block.saturating_sub(1000),
-};
+let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000));
75-80: Simplify to_block assignment

Similarly, the assignment of to_block can be simplified for clarity:
-let to_block = match self.to_block {
-    Some(block) => match BlockId::Number(block) {
-        BlockId::Number(num) => num,
-        _ => latest_block,
-    },
-    None => latest_block,
-};
+let to_block = self.to_block.unwrap_or(latest_block);

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2dd3ead and 5074ac5.

📒 Files selected for processing (1)

bin/sozo/src/commands/events.rs (2 hunks)

🔇 Additional comments (1)

bin/sozo/src/commands/events.rs (1)

85-131: Verify block chunking logic for off-by-one errors

Please verify that the calculation of chunk_end and the update of current_start correctly process all blocks without overlap or gaps. Specifically, ensure that all blocks from from_block to to_block are processed exactly once.

Here's a script to check the block ranges:

✅ Verification successful

Ohayo! Block chunking logic is implemented correctly sensei!

The verification confirms that:

All blocks are processed exactly once

No gaps or overlaps exist between chunks

Edge cases, including the final block, are handled properly
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check for overlaps or gaps in block ranges.

START_BLOCK=1000
END_BLOCK=2000
MAX_BLOCK_RANGE=500

current_start=$START_BLOCK

while [ $current_start -le $END_BLOCK ]; do
  chunk_end=$((current_start + MAX_BLOCK_RANGE - 1))
  if [ $chunk_end -gt $END_BLOCK ]; then
    chunk_end=$END_BLOCK
  fi
  echo "Processing blocks from $current_start to $chunk_end"
  current_start=$((chunk_end + 1))
done
Length of output: 620

coderabbitai · 2025-01-20T01:28:42Z

bin/sozo/src/commands/events.rs

+                if let Some(continuation_token) = res.continuation_token {
+                    println!("Continuation token: {:?}", continuation_token);
+                    println!("----------------------------------------------");
+                }
+
+                current_start = chunk_end + 1;


⚠️ Potential issue

Handle continuation_token for paginated event retrieval

When retrieving events, if a continuation_token is returned, it indicates more events are available within the current block range. Currently, the code prints the continuation_token but doesn't use it to fetch additional events. To ensure all events are retrieved, loop until res.continuation_token is None, using the token in subsequent get_events calls.

Would you like me to provide a revised implementation that correctly handles the continuation_token?

glihm

@okhaimie-dev thanks for the work here. Also, I'll update the issue, since we have other places where the world_block is actually used.

glihm · 2025-01-21T22:55:52Z

bin/sozo/src/commands/events.rs

@@ -17,6 +17,8 @@ use super::options::starknet::StarknetOptions;
 use super::options::world::WorldOptions;
 use crate::utils;

+const MAX_BLOCK_RANGE: u64 = 50_000;


Add a comment mentioning that it's an arbitrary value for now that seems to work (and we need to validate that too).

glihm · 2025-01-21T22:56:11Z

bin/sozo/src/commands/events.rs

-            } else {
-                self.from_block.map(BlockId::Number)
-            };
+            // Get latest block number


Suggested change

// Get latest block number

Don't use comment when it's self explanatory. 👍

glihm · 2025-01-21T22:58:44Z

bin/sozo/src/commands/events.rs

+            let from_block = match self.from_block {
+                Some(block) => match BlockId::Number(block) {
+                    BlockId::Number(num) => num,
+                    _ => latest_block.saturating_sub(1000),
+                },


Let's keep using the world_block from the config, since we still want to start at a later block to avoid indexing the whole blockchain from block 0.

glihm · 2025-01-21T23:00:26Z

bin/sozo/src/commands/events.rs

+                    BlockId::Number(num) => num,
+                    _ => latest_block.saturating_sub(1000),
+                },
+                None => latest_block.saturating_sub(1000),


The idea was to have a reasonable default window to start fetching event when the event is used. And since you recommend to remove the world_block I figured starting from around 24hr from the latest block would be reasonable.

What do you think?

Oh I see, so if there's a world block, we take it. If not, you would start 1000 blocks before the latest?

In this case, it may be better starting to 0 actually. Like so, the default behavior is something easy to grasp for user. WDYT?

I don't think it will be better to start from block 0 because of how long it will take if they are fetching events from mainnet or sepolia, but I see what you are saying. Maybe it will be worth to leave a note in the sozo event doc to let user know the default behavior of starting to fetch from blocks produced 24hrs prior if a from_block is not provided

#[derive(Debug, Args)] pub struct EventsArgs { #[arg(help = "List of specific events to be filtered")] #[arg(value_delimiter = ',')] pub events: Option<Vec<String>>, #[arg(short, long)] #[arg(help = "Block number from where to look for events")] pub from_block: Option<u64>, #[arg(short, long)] #[arg(help = "Block number until where to look for events")] pub to_block: Option<u64>,

If not, I will go with your recommendation and switch it 0 instead.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

bin/sozo/src/commands/events.rs (1)

110-114: ⚠️ Potential issue

Handle continuation token for complete event retrieval.

The code prints the continuation token but doesn't use it to fetch remaining events within the current block range.

For complete event retrieval, we should process all events within each chunk before moving to the next block range. Would you like me to provide an implementation that correctly handles the continuation token?

🧹 Nitpick comments (1)

bin/sozo/src/commands/events.rs (1)

82-85: Ohayo sensei! Add retry mechanism for RPC calls.

The event fetching could fail due to transient network issues. Consider implementing a retry mechanism with exponential backoff for better reliability.

Here's a suggested implementation using tokio-retry:

+use tokio_retry::strategy::{ExponentialBackoff, jitter};
+use tokio_retry::Retry;
+
+            let retry_strategy = ExponentialBackoff::from_millis(100)
+                .map(jitter)
+                .take(3);
+
-                let res = provider
+                let res = Retry::spawn(retry_strategy, || provider
                     .get_events(event_filter, self.continuation_token.clone(), self.chunk_size)
-                    .await?;
+                    .boxed())
+                    .await?;

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 050c2ee and 77bee69.

📒 Files selected for processing (1)

bin/sozo/src/commands/events.rs (2 hunks)

🔇 Additional comments (1)

bin/sozo/src/commands/events.rs (1)

19-21: Ohayo sensei! Let's validate the MAX_BLOCK_RANGE through benchmarking.

The constant value of 50,000 blocks needs validation through benchmarking to ensure optimal performance across different network conditions and load scenarios.

Run this script to analyze historical event fetching performance:

✅ Verification successful

Ohayo sensei! The MAX_BLOCK_RANGE value of 50,000 is reasonable as a default.

After analyzing similar implementations in Katana RPC and Saya, this value appears to be a good balance between performance and node load. The chunking mechanism allows for efficient pagination while preventing overwhelming the node.

Consider making this configurable via CLI or config file for different deployment scenarios, similar to Saya's approach

Keep the "pending benchmarking" comment as it correctly indicates this value might need adjustment based on specific network conditions
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Analyze event fetching performance with different block ranges
# Test ranges: 10k, 25k, 50k, 75k, 100k blocks

for range in 10000 25000 50000 75000 100000; do
    echo "Testing block range: $range"
    time cargo run --bin sozo events \
        --from-block $(($(cat .block_number) - range)) \
        --to-block $(cat .block_number) 2>&1 | \
        grep -E "real|Continuation token"
    echo "----------------------------------------"
done
Length of output: 2357

Script:
#!/bin/bash
# Search for benchmark files and similar constants
echo "=== Searching for benchmark files ==="
fd bench
echo -e "\n=== Searching for similar block range constants ==="
rg "BLOCK_RANGE|block_range" -A 2
echo -e "\n=== Looking for event-related test files ==="
fd "test.*events"
Length of output: 5128

Script:
#!/bin/bash
# Check contents of benchmark files
echo "=== Checking sozo benchmarks ==="
cat bin/sozo/benches/* 2>/dev/null || echo "No benchmark files found"
echo -e "\n=== Checking if events.rs has tests ==="
find bin/sozo/src -type f -name "*test*.rs" -exec grep -l "events" {} \;
Length of output: 1881

coderabbitai · 2025-01-22T17:23:08Z

bin/sozo/src/commands/events.rs

+            let latest_block = provider.block_number().await?;
+            let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000));
+            let to_block = self.to_block.unwrap_or(latest_block);
+


🛠️ Refactor suggestion

Consider world_block configuration for initial block range.

Based on previous discussions, we should consider using world_block from the configuration when available, before falling back to the 1000-block window. This ensures consistency with the world configuration while maintaining the reasonable default window.

Apply this diff to incorporate world_block:

- let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000)); + let from_block = self.from_block.unwrap_or_else(|| { + world_diff.world_info + .world_block + .unwrap_or_else(|| latest_block.saturating_sub(1000)) + });

Committable suggestion skipped: line range outside the PR's diff.

codecov · 2025-01-22T18:10:04Z

Codecov Report

Attention: Patch coverage is 0% with 42 lines in your changes missing coverage. Please review.

Project coverage is 55.74%. Comparing base (8f54b66) to head (77bee69).

Files with missing lines	Patch %	Lines
bin/sozo/src/commands/events.rs	0.00%	42 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2929      +/-   ##
==========================================
- Coverage   55.75%   55.74%   -0.01%     
==========================================
  Files         445      445              
  Lines       57627    57629       +2     
==========================================
- Hits        32129    32126       -3     
- Misses      25498    25503       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bin/sozo/src/commands/events.rs

glihm · 2025-01-23T04:07:52Z

bin/sozo/src/commands/events.rs

-                keys,
-            };
+            let latest_block = provider.block_number().await?;
+            let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000));


As edited in the issue, please consider using the world_block from the config, if it exists. 👍

glihm · 2025-01-23T04:10:53Z

bin/sozo/src/commands/events.rs

+                if let Some(continuation_token) = res.continuation_token {
+                    println!("Continuation token: {:?}", continuation_token);
+                    println!("----------------------------------------------");
+                }
+
+                current_start = chunk_end + 1;


We can't just continue here, since the continuation token will indicate that there are more events to process. However in the new logic, since we're not stopping, those events would be lost.

The logic here could change to automatically fetch all the pages (as we do in the world_from_events function), and then display all the events.
Doing so, the continuation token is no more required to be exposed to the user. 👍

Co-authored-by: glihm <[email protected]>

feat: removed world block

5074ac5

coderabbitai bot reviewed Jan 20, 2025

View reviewed changes

glihm added the contributor label Jan 21, 2025

glihm changed the title ~~feat ( sozo ): removed world block~~ feat(sozo): removed world block Jan 21, 2025

glihm changed the title ~~feat(sozo): removed world block~~ feat(sozo): auto chunk blocks fetching world data Jan 21, 2025

glihm mentioned this pull request Jan 21, 2025

feat(sozo): remove the need of world_block #2863

Closed

7 tasks

glihm reviewed Jan 21, 2025

View reviewed changes

okhaimie-dev and others added 2 commits January 22, 2025 10:46

Merge branch 'main' into feat/remove-world-block

050c2ee

feat: minor changes

77bee69

coderabbitai bot reviewed Jan 22, 2025

View reviewed changes

glihm requested changes Jan 23, 2025

View reviewed changes

okhaimie-dev and others added 2 commits January 27, 2025 06:00

Merge branch 'main' into feat/remove-world-block

c845d89

Update bin/sozo/src/commands/events.rs

70532e8

Co-authored-by: glihm <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sozo): auto chunk blocks fetching world data #2929

feat(sozo): auto chunk blocks fetching world data #2929

okhaimie-dev commented Jan 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 20, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Jan 20, 2025

glihm left a comment

glihm Jan 21, 2025

glihm Jan 21, 2025

okhaimie-dev Jan 22, 2025

glihm Jan 21, 2025

glihm Jan 21, 2025

okhaimie-dev Jan 22, 2025

glihm Jan 22, 2025

okhaimie-dev Jan 22, 2025 •

edited

Loading

coderabbitai bot left a comment

coderabbitai bot Jan 22, 2025

codecov bot commented Jan 22, 2025

glihm Jan 23, 2025

glihm Jan 23, 2025

feat(sozo): auto chunk blocks fetching world data #2929

Are you sure you want to change the base?

feat(sozo): auto chunk blocks fetching world data #2929

Conversation

okhaimie-dev commented Jan 20, 2025 • edited by coderabbitai bot Loading

Description

Related issue

Tests

Added to documentation?

Checklist

Summary by CodeRabbit

coderabbitai bot commented Jan 20, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 20, 2025

Choose a reason for hiding this comment

glihm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

okhaimie-dev Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 22, 2025

Choose a reason for hiding this comment

codecov bot commented Jan 22, 2025

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

okhaimie-dev commented Jan 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 20, 2025 •

edited

Loading

okhaimie-dev Jan 22, 2025 •

edited

Loading