Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epoch-enabled catchup tests #4070

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Epoch-enabled catchup tests #4070

wants to merge 11 commits into from

Conversation

lukaszrzasik
Copy link
Contributor

@lukaszrzasik lukaszrzasik commented Jan 22, 2025

Closes #<ISSUE_NUMBER>

This PR:

  • Adds more epoch-enabled catchup tests
  • Stores last actioned epoch in TestStorage
  • Adds new trait EpochMessage and implement it for MessageKind. This is useful to extract an epoch from a message to record it.
  • Implements an epoch_number method for SequencingMessage to easily extract an associated epoch using HasEpoch trait of the message's underlying data.
  • Implements HasEpoch trait for the "old" types (types prior to epoch upgrade). The implementation is trivial, returns None.
  • Unfortunately HasEpoch trait does not make sense for the QuorumProposal2 and QuorumProposalWrapper types. These types do not have an epoch field. Instead, the epoch needs to be calculated. Because of that the epoch method of the HasEpoch trait panics for these types.
  • Minor refactor of OverallSafetyTask which includes:
    • the Decide, Error and ReplicaViewTimeout sent by nodes not belonging to the relevant epoch are ignored
    • all leaves sent in the leaf chain attached to the Decide event are taken into account, instead of only the leaf with the highest view. This fixes successful views count.
  • Increases REQUEST_TIMEOUT to 2000 ms.

This PR does not:

Add the test_staggered_restart_with_epochs test. The test is currently commented out. The problem is the test always fails in CI but it never fails for me locally. It makes debugging it much more time consuming. I will continue doing that but I'd like to have the PR merged in the meantime.
After increasing REQUEST_TIMEOUT the test passes.

Key places to review:

HasEpoch implementions for QuorumProposal2 and QuorumProposalWrapper in the crates/types/src/data.rs file. Are the panics fine there?
crates/testing/src/overall_safety_task.rs file

@lukaszrzasik lukaszrzasik requested a review from ss-es as a code owner January 23, 2025 13:24
@lukaszrzasik lukaszrzasik changed the title Lr/store epoch Epoch-enabled catchup tests Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants