Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The main branch is not compatible with gemini-3g network #2211

Closed
NingLin-P opened this issue Nov 9, 2023 · 8 comments · Fixed by #2216
Closed

The main branch is not compatible with gemini-3g network #2211

NingLin-P opened this issue Nov 9, 2023 · 8 comments · Fixed by #2216

Comments

@NingLin-P
Copy link
Member

Recently, we have introduced a new field to the ProofOfElection:
https://github.com/subspace/subspace/blob/53dca169379e65b4fb97b5c7753f5d00bded2ef2/crates/sp-domains/src/lib.rs#L546-L547

This causes the main branch not compatible with the gemini-3g, because when we use runtime API to extract bundle/ER from the consensus chain, it will return the old struct that doesn't have the new field while the client will try to parse it with the new field, thus failed and cause the node shut down.

In this case, I'm not sure what is the best practice to handle this, do we need to keep the old struct definition in the code and use the runtime API version to determine parsing the old struct or new struct? cc @vedhavyas

@vedhavyas
Copy link
Member

That is unfortunate. Can you confirm the if submit_bundle is working as expected since that also should be affected by this change I believe 🤔

One approach I'm seeing here is to introduce the custom Decode and Encode functions until the runtime is upgraded.

@NingLin-P
Copy link
Member Author

That is unfortunate. Can you confirm the if submit_bundle is working as expected since that also should be affected by this change I believe

Yeah, it also affects submit_bundle with error:

2023-11-09T10:31:02.159912Z [Domain] panicked at /code/crates/subspace-runtime/src/lib.rs:810:1:
Bad input data provided to submit_bundle_unsigned: Codec error

One approach I'm seeing here is to introduce the custom Decode and Encode functions until the runtime is upgraded.

It is still hard to get the runtime version within the encode/decode function.

Since the new field causes the incompatible, other approaches would be:

  • Remove the field temporarily (thus disable bundle equivocation proof verification) until next network
  • Fork the gemini-3g maintenance branch without including the new field

@nazar-pc
Copy link
Member

nazar-pc commented Nov 9, 2023

@vedhavyas @NingLin-P I want to make a new snapshot, do we have a solution here?

@vedhavyas
Copy link
Member

So, the decode can selectively chose fields depending on older or newer runtime and for bundle submission, we can introduce the api version check to submit older type or newer type.

This will take some time to implement. If you @nazar-pc want to make a snapshot right now, then fastest way is to just comment the field and the verification like @NingLin-P suggested

@nazar-pc
Copy link
Member

nazar-pc commented Nov 9, 2023

I do want to make snapshot once #2208 is merged, so please un-break it somehow so we can move forward.

@vedhavyas
Copy link
Member

We are trying to skip the encoding and decoding while we figure out next steps.
Also, the breakage is limited to just operators and domains and not the consensus chain.
So this should not affect consensus nodes even if you make a snapshot without the change

@nazar-pc
Copy link
Member

nazar-pc commented Nov 9, 2023

We already have some operators on the network and it is not that unlikely that people will attempt to upgrade when new release comes out, so if we can fix it quickly we probably should.

@NingLin-P
Copy link
Member Author

I found another change that cause the main not compatible with gemini-3g, which is we have change the bundle_digest value in the ER to keep consistency with the domain block extrinsic root fraud proof:
56a90b8#diff-c719de56a192758914b38c863eab11b914a2c7c216e51d97164f9589d6fbb91fL231-R232

which will cause the operator of the main branch produced a different ER than the gemini-3g's operator, I think this is what cause the #[codec(skip)] approach don't work and result in UnknownParentBlockReceipt.

TLDR; The incompatible is not solve completely by #2214, and we can take the #[codec(skip)] approach to fix the runtime API incompatible instead and we also need to fix the bundle_digest incompatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants