Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHiP: tests for restarting from corrupted log and index files #983

Merged
merged 6 commits into from
Nov 5, 2024

Conversation

linh2931
Copy link
Member

@linh2931 linh2931 commented Oct 29, 2024

Added a suite of tests to verify SHiP recovers when restarted in a variety of log and index corruption scenarios.

Resolves #947

@linh2931 linh2931 marked this pull request as ready for review October 30, 2024 14:00
return True

# Verifies that SHiP should fail to restart with a corrupted first entry header
def corruptedHeaderTest(pos, curruptedValue, shipNode):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sp: corruptedValue

stateHistoryIndex = ""

# Returns True if file1 and file2 contain the same content
def equalFiles(file1, file2):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use Utils.compareFiles instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I did not notice that function. I just need to add an extra argument for file mode (the original version does not work for binary file comparison).

@ericpassmore
Copy link
Contributor

Note:start
category: Tests
component: SHiP
summary: Added a suite of tests to verify SHiP recovers when restarted in a variety of log and index corruption scenarios.
Note:end

Print("Stand up cluster")

specificExtraNodeosArgs={}
specificExtraNodeosArgs[shipNodeId]="--plugin eosio::state_history_plugin --trace-history --chain-state-history --finality-data-history --state-history-stride 200 --plugin eosio::net_api_plugin --plugin eosio::producer_api_plugin"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the chain ever make it past block 200 to exercise anything other then the simple "flat log" case? But I suppose there is probably not much that can be done for a corrupted non-head split log anyways -- fixing it just makes a hole that isn't allowed.

I do wonder if we should do some additional corruption tests with pruned logs. But those are even less repairable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the chain ever make it past block 200 to exercise anything other then the simple "flat log" case? But I suppose there is probably not much that can be done for a corrupted non-head split log anyways -- fixing it just makes a hole that isn't allowed.

I do wonder if we should do some additional corruption tests with pruned logs. But those are even less repairable.

Thanks. I created #1011 to track additional tests for corrupted pruned and split tests. I think they should be done as a part of the bigger initiative to revisit how corrupted logs are handled (repairing at the startup or offline via spring-util.

I am merging the PR in current form so we have some basic tests.

@linh2931 linh2931 merged commit 5179917 into main Nov 5, 2024
36 checks passed
@linh2931 linh2931 deleted the ship_start_from_corrupted branch November 5, 2024 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure that plugin recovers when restarted with corrupted SHiP log
5 participants