-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sozo): auto chunk blocks fetching world data #2929
base: main
Are you sure you want to change the base?
Conversation
WalkthroughOhayo, sensei! The changes in the events command focus on enhancing event retrieval efficiency by introducing a maximum block range of 50,000 blocks. The new implementation processes events in chunks, ensuring more controlled and predictable fetching. Instead of handling events in a single batch, the code now breaks down the retrieval into manageable ranges, adjusting the Changes
Sequence DiagramsequenceDiagram
participant User
participant EventCommand
participant Provider
User->>EventCommand: Request events
EventCommand->>Provider: Get latest block number
Provider-->>EventCommand: Return latest block number
EventCommand->>EventCommand: Adjust from_block and to_block
loop Block Range Chunks
EventCommand->>Provider: Retrieve events for chunk
Provider-->>EventCommand: Return events
EventCommand->>User: Display events and continuation token
end
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
bin/sozo/src/commands/events.rs (3)
20-20
: Consider makingMAX_BLOCK_RANGE
configurableDefining
MAX_BLOCK_RANGE
as a constant is fine, but making it a configurable parameter would enhance flexibility. This allows users to adjust the block range without modifying the code, improving usability in different scenarios.
68-73
: Simplifyfrom_block
assignmentThe nested match statements for assigning
from_block
can be simplified sinceBlockId::Number(block)
will always matchBlockId::Number(num)
. Here's a streamlined version:-let from_block = match self.from_block { - Some(block) => match BlockId::Number(block) { - BlockId::Number(num) => num, - _ => latest_block.saturating_sub(1000), - }, - None => latest_block.saturating_sub(1000), -}; +let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000));
75-80
: Simplifyto_block
assignmentSimilarly, the assignment of
to_block
can be simplified for clarity:-let to_block = match self.to_block { - Some(block) => match BlockId::Number(block) { - BlockId::Number(num) => num, - _ => latest_block, - }, - None => latest_block, -}; +let to_block = self.to_block.unwrap_or(latest_block);
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
bin/sozo/src/commands/events.rs
(2 hunks)
🔇 Additional comments (1)
bin/sozo/src/commands/events.rs (1)
85-131
: Verify block chunking logic for off-by-one errorsPlease verify that the calculation of
chunk_end
and the update ofcurrent_start
correctly process all blocks without overlap or gaps. Specifically, ensure that all blocks fromfrom_block
toto_block
are processed exactly once.Here's a script to check the block ranges:
✅ Verification successful
Ohayo! Block chunking logic is implemented correctly sensei!
The verification confirms that:
- All blocks are processed exactly once
- No gaps or overlaps exist between chunks
- Edge cases, including the final block, are handled properly
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for overlaps or gaps in block ranges. START_BLOCK=1000 END_BLOCK=2000 MAX_BLOCK_RANGE=500 current_start=$START_BLOCK while [ $current_start -le $END_BLOCK ]; do chunk_end=$((current_start + MAX_BLOCK_RANGE - 1)) if [ $chunk_end -gt $END_BLOCK ]; then chunk_end=$END_BLOCK fi echo "Processing blocks from $current_start to $chunk_end" current_start=$((chunk_end + 1)) doneLength of output: 620
if let Some(continuation_token) = res.continuation_token { | ||
println!("Continuation token: {:?}", continuation_token); | ||
println!("----------------------------------------------"); | ||
} | ||
|
||
current_start = chunk_end + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle continuation_token
for paginated event retrieval
When retrieving events, if a continuation_token
is returned, it indicates more events are available within the current block range. Currently, the code prints the continuation_token
but doesn't use it to fetch additional events. To ensure all events are retrieved, loop until res.continuation_token
is None
, using the token in subsequent get_events
calls.
Would you like me to provide a revised implementation that correctly handles the continuation_token
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@okhaimie-dev thanks for the work here. Also, I'll update the issue, since we have other places where the world_block
is actually used.
@@ -17,6 +17,8 @@ use super::options::starknet::StarknetOptions; | |||
use super::options::world::WorldOptions; | |||
use crate::utils; | |||
|
|||
const MAX_BLOCK_RANGE: u64 = 50_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment mentioning that it's an arbitrary value for now that seems to work (and we need to validate that too).
bin/sozo/src/commands/events.rs
Outdated
} else { | ||
self.from_block.map(BlockId::Number) | ||
}; | ||
// Get latest block number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Get latest block number |
Don't use comment when it's self explanatory. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright
bin/sozo/src/commands/events.rs
Outdated
let from_block = match self.from_block { | ||
Some(block) => match BlockId::Number(block) { | ||
BlockId::Number(num) => num, | ||
_ => latest_block.saturating_sub(1000), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep using the world_block
from the config, since we still want to start at a later block to avoid indexing the whole blockchain from block 0.
bin/sozo/src/commands/events.rs
Outdated
BlockId::Number(num) => num, | ||
_ => latest_block.saturating_sub(1000), | ||
}, | ||
None => latest_block.saturating_sub(1000), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to have a reasonable default window to start fetching event when the event is used. And since you recommend to remove the world_block
I figured starting from around 24hr from the latest block would be reasonable.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, so if there's a world block, we take it. If not, you would start 1000 blocks before the latest?
In this case, it may be better starting to 0
actually. Like so, the default behavior is something easy to grasp for user. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will be better to start from block 0
because of how long it will take if they are fetching events from mainnet
or sepolia
, but I see what you are saying. Maybe it will be worth to leave a note in the sozo event
doc to let user know the default behavior of starting to fetch from blocks produced 24hrs prior if a from_block
is not provided
#[derive(Debug, Args)]
pub struct EventsArgs {
#[arg(help = "List of specific events to be filtered")]
#[arg(value_delimiter = ',')]
pub events: Option<Vec<String>>,
#[arg(short, long)]
#[arg(help = "Block number from where to look for events")]
pub from_block: Option<u64>,
#[arg(short, long)]
#[arg(help = "Block number until where to look for events")]
pub to_block: Option<u64>,
If not, I will go with your recommendation and switch it 0
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
bin/sozo/src/commands/events.rs (1)
110-114
:⚠️ Potential issueHandle continuation token for complete event retrieval.
The code prints the continuation token but doesn't use it to fetch remaining events within the current block range.
For complete event retrieval, we should process all events within each chunk before moving to the next block range. Would you like me to provide an implementation that correctly handles the continuation token?
🧹 Nitpick comments (1)
bin/sozo/src/commands/events.rs (1)
82-85
: Ohayo sensei! Add retry mechanism for RPC calls.The event fetching could fail due to transient network issues. Consider implementing a retry mechanism with exponential backoff for better reliability.
Here's a suggested implementation using
tokio-retry
:+use tokio_retry::strategy::{ExponentialBackoff, jitter}; +use tokio_retry::Retry; + + let retry_strategy = ExponentialBackoff::from_millis(100) + .map(jitter) + .take(3); + - let res = provider + let res = Retry::spawn(retry_strategy, || provider .get_events(event_filter, self.continuation_token.clone(), self.chunk_size) - .await?; + .boxed()) + .await?;
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
bin/sozo/src/commands/events.rs
(2 hunks)
🔇 Additional comments (1)
bin/sozo/src/commands/events.rs (1)
19-21
: Ohayo sensei! Let's validate the MAX_BLOCK_RANGE through benchmarking.The constant value of 50,000 blocks needs validation through benchmarking to ensure optimal performance across different network conditions and load scenarios.
Run this script to analyze historical event fetching performance:
✅ Verification successful
Ohayo sensei! The MAX_BLOCK_RANGE value of 50,000 is reasonable as a default.
After analyzing similar implementations in Katana RPC and Saya, this value appears to be a good balance between performance and node load. The chunking mechanism allows for efficient pagination while preventing overwhelming the node.
- Consider making this configurable via CLI or config file for different deployment scenarios, similar to Saya's approach
- Keep the "pending benchmarking" comment as it correctly indicates this value might need adjustment based on specific network conditions
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Analyze event fetching performance with different block ranges # Test ranges: 10k, 25k, 50k, 75k, 100k blocks for range in 10000 25000 50000 75000 100000; do echo "Testing block range: $range" time cargo run --bin sozo events \ --from-block $(($(cat .block_number) - range)) \ --to-block $(cat .block_number) 2>&1 | \ grep -E "real|Continuation token" echo "----------------------------------------" doneLength of output: 2357
Script:
#!/bin/bash # Search for benchmark files and similar constants echo "=== Searching for benchmark files ===" fd bench echo -e "\n=== Searching for similar block range constants ===" rg "BLOCK_RANGE|block_range" -A 2 echo -e "\n=== Looking for event-related test files ===" fd "test.*events"Length of output: 5128
Script:
#!/bin/bash # Check contents of benchmark files echo "=== Checking sozo benchmarks ===" cat bin/sozo/benches/* 2>/dev/null || echo "No benchmark files found" echo -e "\n=== Checking if events.rs has tests ===" find bin/sozo/src -type f -name "*test*.rs" -exec grep -l "events" {} \;Length of output: 1881
let latest_block = provider.block_number().await?; | ||
let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000)); | ||
let to_block = self.to_block.unwrap_or(latest_block); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider world_block configuration for initial block range.
Based on previous discussions, we should consider using world_block
from the configuration when available, before falling back to the 1000-block window. This ensures consistency with the world configuration while maintaining the reasonable default window.
Apply this diff to incorporate world_block:
- let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000));
+ let from_block = self.from_block.unwrap_or_else(|| {
+ world_diff.world_info
+ .world_block
+ .unwrap_or_else(|| latest_block.saturating_sub(1000))
+ });
Committable suggestion skipped: line range outside the PR's diff.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2929 +/- ##
==========================================
- Coverage 55.75% 55.74% -0.01%
==========================================
Files 445 445
Lines 57627 57629 +2
==========================================
- Hits 32129 32126 -3
- Misses 25498 25503 +5 ☔ View full report in Codecov by Sentry. |
keys, | ||
}; | ||
let latest_block = provider.block_number().await?; | ||
let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As edited in the issue, please consider using the world_block
from the config, if it exists. 👍
if let Some(continuation_token) = res.continuation_token { | ||
println!("Continuation token: {:?}", continuation_token); | ||
println!("----------------------------------------------"); | ||
} | ||
|
||
current_start = chunk_end + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't just continue here, since the continuation token will indicate that there are more events to process. However in the new logic, since we're not stopping, those events would be lost.
The logic here could change to automatically fetch all the pages (as we do in the world_from_events
function), and then display all the events.
Doing so, the continuation token is no more required to be exposed to the user. 👍
Description
Related issue
#2825
Tests
Added to documentation?
Checklist
scripts/prettier.sh
,scripts/rust_fmt.sh
,scripts/cairo_fmt.sh
)scripts/clippy.sh
,scripts/docs.sh
)Summary by CodeRabbit
New Features
Performance