adapter: Record the amount of time it takes to run RowSetFinishing #28215

ParkMyCar · 2024-07-12T21:33:45Z

This PR adds a Prometheus histogram to instrument RowSetFinishing::finish.

It also adds a SessionMetrics struct which is a new field on Session as a way to pass around this new metric. RowSetFinishing::finish is not always run from the context that has access to a Coordinator or AdapterClient, adding SessionMetrics seemed like the cleanest way to move this metric around.

Motivation

This is an action item that came out of the Incident Response Working Group. From a product quality perspective though we also don't have much visibility into how long this RowSetFinishing::finish currently takes.

Tips for reviewers

Hide whitespace

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:
- N/a

…RowSetFinishing

maddyblue · 2024-07-15T17:39:13Z

src/adapter/src/metrics.rs

+    }
+
+    pub(crate) fn row_set_finishing_seconds(&self) -> Histogram {
+        self.row_set_finishing_seconds.with_label_values(&[])


I was going to propose we add a label here for sorted=true|false. Then I read the code and it looks like we always sort results even if a user didn't specify an ORDER BY? Although in that case it seems that no actual row movement would occur, but we'd still have to do O(something) to verify that. Is this correct? What's the something in those cases? If we are returning 1M+ rows, does the something add up even if there's no swaps?

If something ends up being slowish, it's worth considering delaying this PR so that can get fixed, then we can add a sorted label here.

Thoughts?

This is something I realized recently when chatting with Frank and Moritz about where peek responses spend time. You're right, we do always sort rows even without an ORDER BY, what happens then is the tiebreaker gets called, which in this case is is just a comparison of the byte length of a row. We thought about not doing it but having rows come back in a deterministic way is nice especially for tests. In either case though we don't do any swaps in memory, we create a sorted "view" of the rows by sorting a Vec<usize>, it turns out for 1M+ rows this can actually be quite slow.

We chatted about this in Slack, the proposed fix is doing the sorting in clusterd and then environmentd can do a streaming merge. This is another reason I wanted to add this metric, so we can more easily measure any improvements we try to make

Counterpoint against determinism at this level: having the rows returned in a deterministic order can sometimes be bad for users who don't know that, by the spec, the if there's no ORDER BY we can return in any order. They can sometimes rely on an observed order in their application code, then if we change the implementation they get bit. Go, for example, does runtime randomization of map iteration order to prevent programs from relying on something out-of-spec.

If our tests need to rely on an order, they should be using the in-spec behavior using ORDER BY and not using out-of-spec behavior.

I'm ok merging is as, but think it's bad to have things, even tests, depend on out-of-spec behavior.

That's a great point, I filed https://github.com/MaterializeInc/materialize/issues/28261 to track and put it on the agenda for our team's weekly

maddyblue · 2024-07-15T20:13:14Z

src/adapter/src/metrics.rs

+    }
+
+    pub(crate) fn row_set_finishing_seconds(&self) -> Histogram {
+        self.row_set_finishing_seconds.with_label_values(&[])


Counterpoint against determinism at this level: having the rows returned in a deterministic order can sometimes be bad for users who don't know that, by the spec, the if there's no ORDER BY we can return in any order. They can sometimes rely on an observed order in their application code, then if we change the implementation they get bit. Go, for example, does runtime randomization of map iteration order to prevent programs from relying on something out-of-spec.

If our tests need to rely on an order, they should be using the in-spec behavior using ORDER BY and not using out-of-spec behavior.

I'm ok merging is as, but think it's bad to have things, even tests, depend on out-of-spec behavior.

start, add SessionMetrics, record the amount of time it takes to run …

f80cb7f

…RowSetFinishing

ParkMyCar requested review from a team as code owners July 12, 2024 21:33

ParkMyCar requested a review from maddyblue July 12, 2024 21:33

fix docs

f94883c

maddyblue reviewed Jul 15, 2024

View reviewed changes

maddyblue approved these changes Jul 15, 2024

View reviewed changes

ParkMyCar merged commit c3fe6c4 into MaterializeInc:main Jul 15, 2024
77 checks passed

ParkMyCar deleted the adapter/instrument-row-set-finishing branch October 16, 2024 13:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapter: Record the amount of time it takes to run RowSetFinishing #28215

adapter: Record the amount of time it takes to run RowSetFinishing #28215

ParkMyCar commented Jul 12, 2024 •

edited

Loading

maddyblue Jul 15, 2024

ParkMyCar Jul 15, 2024

maddyblue Jul 15, 2024

ParkMyCar Jul 15, 2024

maddyblue Jul 15, 2024

adapter: Record the amount of time it takes to run RowSetFinishing #28215

adapter: Record the amount of time it takes to run RowSetFinishing #28215

Conversation

ParkMyCar commented Jul 12, 2024 • edited Loading

Motivation

Tips for reviewers

Checklist

maddyblue Jul 15, 2024

Choose a reason for hiding this comment

ParkMyCar Jul 15, 2024

Choose a reason for hiding this comment

maddyblue Jul 15, 2024

Choose a reason for hiding this comment

ParkMyCar Jul 15, 2024

Choose a reason for hiding this comment

maddyblue Jul 15, 2024

Choose a reason for hiding this comment

ParkMyCar commented Jul 12, 2024 •

edited

Loading