-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mz_cluster_deployment_lineage index+view #30706
Conversation
src/catalog/src/builtin.rs
Outdated
* process, but someone turning on a new use case that happens to have the same | ||
* name as a previous but logically distinct use case. | ||
*/ | ||
pub static MZ_CLUSTER_BLUE_GREEN_LINEAGE: LazyLock<BuiltinView> = LazyLock::new(|| BuiltinView { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benesch I figured you'd be the best person to review it. Pretty much the same from your original query here https://github.com/MaterializeInc/console/issues/3345#issue-2589359640 except for a few changes:
- Instead of a search, just does it for all clusters in
mz_clusters
- Maps each id to the current deployment. Similar format to
mz_object_transitive_dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only skimmed, but if you only made the changes you described, LGTM!
## `mz_cluster_blue_green_lineage` | ||
|
||
The `mz_cluster_blue_green_lineage` table shows the blue/green deployment lineage of all clusters in [`mz_clusters`](../mz_catalog/#mz_clusters). It determines all cluster IDs that are logically the same cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finding this name/description to be a little confusing. Does it only contain data on clusters that actively undergoing a blue/green deployment, or does it contain historical data about clusters that have gone through blue/green deployments in the past?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it only contain data on clusters that actively undergoing a blue/green deployment, or does it contain historical data about clusters that have gone through blue/green deployments in the past?
Ah it contains historical data about clusters that have gone through blue/green deployments in the past, as well as current clusters! Brings a good point that I should probably make a testdrive/SLT test for this. Converted to draft to make a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a test in test/sqllogictest/cluster_blue_green_lineage.slt
. Hopefully it better describes what I'm aiming for! I wonder if it makes sense to change the field description of cluster_id
to "The ID of a cluster, regardless of whether it has been dropped or not. Corresponds to mz_clusters.id
." and cluster_name
to current_deployment_cluster_name
? Then maybe explicitly call out that current_deployment_cluster_id
must exist.
/** | ||
* Traces the blue/green deployment lineage in the audit log to determine all cluster | ||
* IDs that are logically the same cluster. | ||
* cluster_id: The ID of a cluster. | ||
* current_deployment_cluster_id: The cluster ID of the last cluster in | ||
* cluster_id's blue/green lineage. | ||
* cluster_name: The name of the cluster. | ||
* The approach taken is as follows. First, find all extant clusters and add them | ||
* to the result set. Per cluster, we do the following: | ||
* 1. Find the most recent create or rename event. This moment represents when the | ||
* cluster took on its final logical identity. | ||
* 2. Look for a cluster that had the same name (or the same name with `_dbt_deploy` | ||
* appended) that was dropped within one minute of that moment. That cluster is | ||
* almost certainly the logical predecessor of the current cluster. Add the cluster | ||
* to the result set. | ||
* 3. Repeat the procedure until a cluster with no logical predecessor is discovered. | ||
* Limiting the search for a dropped cluster to a window of one minute is a heuristic, | ||
* but one that's likely to be pretty good one. If a name is reused after more | ||
* than one minute, that's a good sign that it wasn't an automatic blue/green | ||
* process, but someone turning on a new use case that happens to have the same | ||
* name as a previous but logically distinct use case. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to block this PR, but maybe we should add explicit information to audit events that relate to blue-green deployments that would allow us to more easily and correctly find this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially! Though the audit log migration from my experience might be brutal itself 😬 . There's been talk about this however https://materializeinc.slack.com/archives/C06GZ7GBKB5/p1731957134123909 but adding the WMR view seemed like the easiest thing to do for now!
@@ -131,6 +131,17 @@ the most recent status for each AWS PrivateLink connection in the system. | |||
| `last_status_change_at` | [`timestamp with time zone`] | Wall-clock timestamp of the connection status change.| | |||
| `status` | [`text`] | | The status of the connection: one of `pending-service-discovery`, `creating-endpoint`, `recreating-endpoint`, `updating-endpoint`, `available`, `deleted`, `deleting`, `expired`, `failed`, `pending`, `pending-acceptance`, `rejected`, or `unknown`. | | |||
|
|||
## `mz_cluster_blue_green_lineage` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this "deployment" instead of "blue/green"? I think that's a bit cleaner/clearer!
## `mz_cluster_blue_green_lineage` | |
## `mz_cluster_deployment_lineage` |
<!-- RELATION_SPEC mz_internal.mz_cluster_blue_green_lineage --> | ||
| Field | Type | Meaning | | ||
|-------------------------------------|--------------|----------------------------------------------------------------| | ||
| `cluster_id` | [`text`] | The ID of the cluster. Corresponds to [`mz_clusters.id`](../mz_catalog/#mz_clusters). | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `cluster_id` | [`text`] | The ID of the cluster. Corresponds to [`mz_clusters.id`](../mz_catalog/#mz_clusters). | | |
| `cluster_id` | [`text`] | The ID of the cluster. Corresponds to [`mz_clusters.id`](../mz_catalog/#mz_clusters) (though the cluster may no longer exist). | |
src/catalog/src/builtin.rs
Outdated
* process, but someone turning on a new use case that happens to have the same | ||
* name as a previous but logically distinct use case. | ||
*/ | ||
pub static MZ_CLUSTER_BLUE_GREEN_LINEAGE: LazyLock<BuiltinView> = LazyLock::new(|| BuiltinView { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only skimmed, but if you only made the changes you described, LGTM!
We need this view+index to determine the logical identity of clusters between blue green deployments in the Console.
We need this view+index to determine the logical identity of clusters between blue green deployments in the Console.
The followup Console PR is here https://github.com/MaterializeInc/console/pull/3556
Motivation
https://github.com/MaterializeInc/console/issues/3345
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.