-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/graceful reconfig cancel #28092
Feature/graceful reconfig cancel #28092
Conversation
src/adapter/src/coord/ddl.rs
Outdated
let reconfiguring_clusters = self | ||
.active_conns | ||
.get(conn_id) | ||
.expect("must exist for active session") | ||
.pending_cluster_alters | ||
.clone(); | ||
self.drop_reconfiguration_replicas(reconfiguring_clusters) | ||
.await; | ||
self.active_conns | ||
.get_mut(conn_id) | ||
.expect("must exist for active session") | ||
.pending_cluster_alters | ||
.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd guess there's a much cleaner way of doing this rust, I gave it a couple of failed iterations and moved on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either
let reconfiguring_clusters = self
.active_conns
.get_mut(conn_id)
.expect("must exist for active session")
.pending_cluster_alters
.drain(..)
.collect();
self.drop_reconfiguration_replicas(reconfiguring_clusters)
.await;
or
let reconfiguring_clusters = std::mem::take(self
.active_conns
.get_mut(conn_id)
.expect("must exist for active session")
.pending_cluster_alters);
self.drop_reconfiguration_replicas(reconfiguring_clusters)
.await;
would probably work.
af947d9
to
cad2501
Compare
4582bcf
to
688e448
Compare
688e448
to
2bab0d1
Compare
4d09357
to
0e5dab0
Compare
src/adapter/src/coord/ddl.rs
Outdated
for replica_drops in pending_replica_drops { | ||
self.catalog_transact(None, vec![Op::DropObjects(replica_drops)]) | ||
.await | ||
.unwrap_or_terminate("cannot fail to drop replicas"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we do a single transaction per replica instead of a single transaction with all replicas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confusingly, if I flatten the above Vec of Vecs and do a single transaction there are a lot of test failures that seem completely unrelated. For example, the following test fails ./mzcompose --find testdrive run default blue-green.td
as well as cargo test
.
Full test run:
https://buildkite.com/materialize/test/builds/88225#01914d7d-df74-4eba-ba28-33b389a503e0
I can't imagine a single connection will have more than cluster alter going at any given time, especially given that the execution is blocking. I'm pretty ok with leaving this a single transaction per cluster, but I am a bit confused and curious about why a single transaction for all replicas has this impact.
src/adapter/src/coord/ddl.rs
Outdated
let reconfiguring_clusters = self | ||
.active_conns | ||
.get(conn_id) | ||
.expect("must exist for active session") | ||
.pending_cluster_alters | ||
.clone(); | ||
self.drop_reconfiguration_replicas(reconfiguring_clusters) | ||
.await; | ||
self.active_conns | ||
.get_mut(conn_id) | ||
.expect("must exist for active session") | ||
.pending_cluster_alters | ||
.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either
let reconfiguring_clusters = self
.active_conns
.get_mut(conn_id)
.expect("must exist for active session")
.pending_cluster_alters
.drain(..)
.collect();
self.drop_reconfiguration_replicas(reconfiguring_clusters)
.await;
or
let reconfiguring_clusters = std::mem::take(self
.active_conns
.get_mut(conn_id)
.expect("must exist for active session")
.pending_cluster_alters);
self.drop_reconfiguration_replicas(reconfiguring_clusters)
.await;
would probably work.
062cc8f
to
cf965d9
Compare
cf965d9
to
94de0e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Motivation
Adds cancelation to alter cluster for graceful reconfiguration.
This PR is stacked on the following PRs
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.