upgrades: revisit our migration strategy #3506

erwanor · 2023-12-11T20:08:30Z

Is your feature request related to a problem? Please describe.

To perform a chain upgrade, a node operator needs to halt their node, export and migrate the node's chain state. Frequently, but not always, the migration will alter the chain's consensus state. In that case, we must have a process that is designed to be as least intrusive as possible, specifically we want:

block continuity across the halt boundary
the JMT state version and block height remain equal
the post-upgrade application hash is known prior to performing an initialization handshake with comet

Our current approach, is to overwrite the JMT at the pre-upgrade version, as follow:

node operator halts pd at height h-1
node operator export their chain state
node operator runs a migration, overwriting the JMT at version h-1
node operator starts their node again, the chain resumes from height h

A basic implementation of this migration flow is captured by the SimpleUpgrade migration script. The flow relies on the ability to append data to the JMT "in-place" i.e. without increasing its state version. This process, implemented by Storage::commit_in_place is currently broken, and using it will result in dangling JMT nodes that are inaccessible if queried via get_with_proof. So, while the data for that state version is available in our backing store, we cannot generate proofs for it. This is problematic.

A correct implementation of commit_in_place, we will have to build on top of the jmt::restore API, consuming an existing tree state, and using a variation of the add_chunk method to append to the existing logical structure of the tree, rather than completely overwriting it. The jmt::restore API is also needed to implement JMT pruning (#1806) and state pruning is probably a key part of every realistic chain upgrade scenario. The main difference is that the current API requires callers to specify a post-restoration root hash, which we would not be able to provide before the fact.

Describe alternatives you've considered
Alternatively, we could rework the chain upgrade process to include an "offline" block h:

node operator halts pd at height h-1
node operator export their chain state
node operator runs a migration, creating a block h and incrementing the JMT version number
node operator starts their node again, the chain resumes from height h+1

This would require the offline step to setup an App that would be fed synthetic Begin/EndBlocks. This approach has the benefit of not requiring any JMT change, but creates more testing surface, and higher risk of wrongful validator slashing. For example it would require tracking a "fake" uptime for each validator, using a made up timestamp inserted into to the BeginBlock contents.

The text was updated successfully, but these errors were encountered:

hdevalence · 2023-12-12T05:24:35Z

xref #3505

hdevalence · 2023-12-30T05:05:56Z

I think that the approach of creating synthetic Begin/EndBlocks is not a good one compared to doing JMT surgery. While doing surgery on the JMT to edit the version behavior is somewhat tricky, it's self-contained. We only have to do the trickery once, in one place. On the other hand, the synthetic block approach leaks into higher abstraction levels, and interferes with app logic.

hdevalence · 2024-02-09T00:42:07Z

Idea: what if we avoid surgery entirely by reading all of the keys out of the old database and writing them into a new one as part of a schema migration?

Then all the keys and values have a single version and we get pruning for free.

erwanor · 2024-02-09T00:55:24Z

That's what I had in mind here:

we will have to build on top of the jmt::restore API, consuming an existing tree state, and using a variation of the add_chunk method to append to the existing logical structure of the tree, rather than completely overwriting it. The jmt::restore API is also needed to implement JMT pruning (#1806) and state pruning is probably a key part of every realistic chain upgrade scenario. The main difference is that the current API requires callers to specify a post-restoration root hash, which we would not be able to provide before the fact.

This doesn't completely get us pruning for free. It does clean up the logical structure of the tree but we will still have to do some extra work to prune the value store, so it's better to track pruning separately as a superset of this ticket.

erwanor · 2024-03-18T16:16:39Z

The plan outlined in this ticket is accurate, what remains to be done is implementing it.
The specific implementation differs from the previous ones discussed here (restore the tree, or flatten it and rewrite every key). Instead, we will incrementally append to it. The work has started in a prototype penumbra-zone/jmt#110 and is being validated. Once we have high-confidence that it works, we will be able to merge it, and:

jmt: cut a new release
penumbra: bump jmt version
penumbra: use append_value_sets to overwrite the JMT during migrations

Close #3506, this PR: - use `[email protected]` which includes the `migration` feature - bump `borsh` to `1.3.0` to be able to serialize jmt nodes - revamp the substore migration tests to include basic tests and a proptest strategy - notably, the proptest integration tests lay the groundwork for deeper/more targeted strategies

erwanor self-assigned this Dec 11, 2023

github-project-automation bot added this to Testnets Dec 11, 2023

erwanor moved this to Next in Testnets Dec 14, 2023

erwanor mentioned this issue Jan 4, 2024

Release tracking issue: Network upgrades #1804

Open

19 tasks

This was referenced Jan 4, 2024

view: test syncing against a bootstrapped node #3484

Closed

storage: remove commit_in_place #3505

Closed

erwanor mentioned this issue Jan 4, 2024

upgrades: make sure that relaying works across upgrades #3577

Closed

conorsch mentioned this issue Jan 4, 2024

upgrades: private state rollover during testnet #3081

Closed

8 tasks

erwanor added the _P-V1 Priority: slated for V1 release label Jan 24, 2024

aubrika removed this from Testnets Feb 8, 2024

aubrika added this to Penumbra Feb 8, 2024

github-project-automation bot moved this to 🗄️ Backlog in Penumbra Feb 8, 2024

hdevalence added the _P-medium Medium priority label Feb 9, 2024

aubrika added _P-high High priority and removed _P-medium Medium priority labels Feb 21, 2024

aubrika added this to the Sprint 2 milestone Mar 6, 2024

aubrika added the needs-refinement unclear, incomplete, or stub issue that needs work label Mar 6, 2024

cratelyn added the A-upgrades Area: Relates to chain upgrades label Mar 12, 2024

erwanor mentioned this issue Mar 13, 2024

jmt: allow incremental tree overwrites for a given version penumbra-zone/jmt#110

Merged

cratelyn modified the milestones: Sprint 2, Sprint 3 Mar 18, 2024

erwanor removed the needs-refinement unclear, incomplete, or stub issue that needs work label Mar 18, 2024

cratelyn modified the milestones: Sprint 3, Sprint 2 Mar 18, 2024

erwanor mentioned this issue Mar 19, 2024

cnidarium: use incremental jmt migration #4053

Merged

erwanor moved this from 🗄️ Backlog to In progress in Penumbra Mar 20, 2024

erwanor closed this as completed in #4053 Mar 23, 2024

github-project-automation bot moved this from In progress to Done in Penumbra Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upgrades: revisit our migration strategy #3506

upgrades: revisit our migration strategy #3506

erwanor commented Dec 11, 2023 •

edited

Loading

hdevalence commented Dec 12, 2023

hdevalence commented Dec 30, 2023

hdevalence commented Feb 9, 2024 •

edited

Loading

erwanor commented Feb 9, 2024

erwanor commented Mar 18, 2024 •

edited

Loading

upgrades: revisit our migration strategy #3506

upgrades: revisit our migration strategy #3506

Comments

erwanor commented Dec 11, 2023 • edited Loading

hdevalence commented Dec 12, 2023

hdevalence commented Dec 30, 2023

hdevalence commented Feb 9, 2024 • edited Loading

erwanor commented Feb 9, 2024

erwanor commented Mar 18, 2024 • edited Loading

erwanor commented Dec 11, 2023 •

edited

Loading

hdevalence commented Feb 9, 2024 •

edited

Loading

erwanor commented Mar 18, 2024 •

edited

Loading