Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adapter,controller: add replica metrics history #29254

Merged

Conversation

teskje
Copy link
Contributor

@teskje teskje commented Aug 28, 2024

This PR adds a new builtin source mz_cluster_replica_metrics_history, and a corresponding view mz_cluster_replica_utilization_history on top of it. They contain the data already present in mz_cluster_replica_{metrics,utilization}, but as (pseudo-) append-only collections, rather than retained-metrics collections.

This plan is to remove the old retained-metrics collections in favor of the new append-only ones, but we need to wait until the new ones had time to backfill the historical data present in the old ones.

Motivation

  • This PR adds a known-desirable feature.

Part of https://github.com/MaterializeInc/database-issues/issues/8403

Tips for reviewer

Some of our user docs (Troubleshooting, Grafana, Datadog) refer to mz_cluster_replica_utilization. I'm not touching these here, but we'll need to change them when/before we remove that relation.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@teskje teskje force-pushed the mz_cluster_replica_metrics_history branch 2 times, most recently from 8efe34a to eab0341 Compare August 28, 2024 14:26
@teskje teskje changed the title catalog,controller: add replica metrics history adapter,controller: add replica metrics history Aug 28, 2024
@teskje teskje force-pushed the mz_cluster_replica_metrics_history branch 2 times, most recently from dd399ae to 5f544d7 Compare August 29, 2024 12:58
@teskje teskje force-pushed the mz_cluster_replica_metrics_history branch from 5f544d7 to ec750bf Compare August 30, 2024 11:12
This commit adds a new builtin source,
`mz_cluster_replica_metrics_history`, and a corresponding view
`mz_cluster_replica_utilization_history` on top of it. They contain the
data already present in `mz_cluster_replica_{metrics,utilization}`, but
as (pseudo-) append-only collections, rather than retained-metrics
collections.

This plan is to remove the old retained-metrics collections in favor of
the new append-only ones, but we need to wait until the new ones had
time to backfill the historical data present in the old ones.

This commit does not include any code to partially truncate
`mz_cluster_replica_metrics_history`, which is something we want to
ensure it doesn't grow unboundedly. A subsequent commit will deal with
that.
This commit adds truncation of the replica metrics history collection
during bootstrap. In contrast to the existing status history
collections, this collection is truncated based on the age of events
instead of the number of events per ID.

The implementation of the truncation method is kept simple. In the
future it can be refined to avoid reading the entire collection into
memory, if it turns out that the size of a metrics collection is
problematic.
@teskje teskje force-pushed the mz_cluster_replica_metrics_history branch from ec750bf to deeaca1 Compare September 1, 2024 08:38
@teskje teskje marked this pull request as ready for review September 1, 2024 12:22
@teskje teskje requested review from a team as code owners September 1, 2024 12:22
ScalarType::TimestampTz { precision: None }.nullable(false),
)
.finish()
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really fit into healthcheck.rs, but we already have the statement execution history stuff here.

Copy link
Contributor

@jkosh44 jkosh44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adapter changes LGTM

Copy link
Contributor

@aljoscha aljoscha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked at the new truncation method, and that one looked good to me!

@teskje
Copy link
Contributor Author

teskje commented Sep 4, 2024

TFTRs!

@teskje teskje merged commit 56fcbe4 into MaterializeInc:main Sep 4, 2024
204 of 207 checks passed
@teskje teskje deleted the mz_cluster_replica_metrics_history branch September 4, 2024 08:22
@github-actions github-actions bot locked and limited conversation to collaborators Sep 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants