Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/graceful reconfig cancel #28092

Merged
merged 1 commit into from
Aug 16, 2024

Conversation

jubrad
Copy link
Contributor

@jubrad jubrad commented Jul 8, 2024

Motivation

Adds cancelation to alter cluster for graceful reconfiguration.

This PR is stacked on the following PRs

Tips for reviewer

Checklist

Comment on lines 977 to 1021
let reconfiguring_clusters = self
.active_conns
.get(conn_id)
.expect("must exist for active session")
.pending_cluster_alters
.clone();
self.drop_reconfiguration_replicas(reconfiguring_clusters)
.await;
self.active_conns
.get_mut(conn_id)
.expect("must exist for active session")
.pending_cluster_alters
.clear();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd guess there's a much cleaner way of doing this rust, I gave it a couple of failed iterations and moved on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either

        let reconfiguring_clusters = self
            .active_conns
            .get_mut(conn_id)
            .expect("must exist for active session")
            .pending_cluster_alters
            .drain(..)
            .collect();
        self.drop_reconfiguration_replicas(reconfiguring_clusters)
            .await;

or

        let reconfiguring_clusters = std::mem::take(self
            .active_conns
            .get_mut(conn_id)
            .expect("must exist for active session")
            .pending_cluster_alters);
        self.drop_reconfiguration_replicas(reconfiguring_clusters)
            .await;

would probably work.

@jubrad jubrad force-pushed the feature/graceful-reconfig-cancel branch 3 times, most recently from af947d9 to cad2501 Compare July 14, 2024 05:00
@jubrad jubrad force-pushed the feature/graceful-reconfig-cancel branch 2 times, most recently from 4582bcf to 688e448 Compare August 1, 2024 17:25
@jubrad jubrad force-pushed the feature/graceful-reconfig-cancel branch from 688e448 to 2bab0d1 Compare August 7, 2024 03:21
@jubrad jubrad force-pushed the feature/graceful-reconfig-cancel branch 3 times, most recently from 4d09357 to 0e5dab0 Compare August 9, 2024 14:32
@jubrad jubrad marked this pull request as ready for review August 9, 2024 15:21
@jubrad jubrad requested review from a team as code owners August 9, 2024 15:21
@jubrad jubrad requested a review from jkosh44 August 9, 2024 15:21
src/adapter/src/coord.rs Show resolved Hide resolved
src/adapter/src/coord/sequencer/inner/cluster.rs Outdated Show resolved Hide resolved
src/adapter/src/coord/sequencer/inner/cluster.rs Outdated Show resolved Hide resolved
Comment on lines 954 to 964
for replica_drops in pending_replica_drops {
self.catalog_transact(None, vec![Op::DropObjects(replica_drops)])
.await
.unwrap_or_terminate("cannot fail to drop replicas");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do a single transaction per replica instead of a single transaction with all replicas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confusingly, if I flatten the above Vec of Vecs and do a single transaction there are a lot of test failures that seem completely unrelated. For example, the following test fails ./mzcompose --find testdrive run default blue-green.td as well as cargo test.

Full test run:
https://buildkite.com/materialize/test/builds/88225#01914d7d-df74-4eba-ba28-33b389a503e0

I can't imagine a single connection will have more than cluster alter going at any given time, especially given that the execution is blocking. I'm pretty ok with leaving this a single transaction per cluster, but I am a bit confused and curious about why a single transaction for all replicas has this impact.

Comment on lines 977 to 1021
let reconfiguring_clusters = self
.active_conns
.get(conn_id)
.expect("must exist for active session")
.pending_cluster_alters
.clone();
self.drop_reconfiguration_replicas(reconfiguring_clusters)
.await;
self.active_conns
.get_mut(conn_id)
.expect("must exist for active session")
.pending_cluster_alters
.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either

        let reconfiguring_clusters = self
            .active_conns
            .get_mut(conn_id)
            .expect("must exist for active session")
            .pending_cluster_alters
            .drain(..)
            .collect();
        self.drop_reconfiguration_replicas(reconfiguring_clusters)
            .await;

or

        let reconfiguring_clusters = std::mem::take(self
            .active_conns
            .get_mut(conn_id)
            .expect("must exist for active session")
            .pending_cluster_alters);
        self.drop_reconfiguration_replicas(reconfiguring_clusters)
            .await;

would probably work.

@jubrad jubrad force-pushed the feature/graceful-reconfig-cancel branch 4 times, most recently from 062cc8f to cf965d9 Compare August 14, 2024 03:36
@jubrad jubrad requested a review from jkosh44 August 14, 2024 14:51
@jubrad jubrad force-pushed the feature/graceful-reconfig-cancel branch from cf965d9 to 94de0e2 Compare August 16, 2024 01:18
Copy link
Contributor

@jkosh44 jkosh44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jubrad jubrad merged commit bdaca97 into MaterializeInc:main Aug 16, 2024
79 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants