Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adapter: Record the amount of time it takes to run RowSetFinishing #28215

Merged

Conversation

ParkMyCar
Copy link
Member

@ParkMyCar ParkMyCar commented Jul 12, 2024

This PR adds a Prometheus histogram to instrument RowSetFinishing::finish.

It also adds a SessionMetrics struct which is a new field on Session as a way to pass around this new metric. RowSetFinishing::finish is not always run from the context that has access to a Coordinator or AdapterClient, adding SessionMetrics seemed like the cleanest way to move this metric around.

Motivation

This is an action item that came out of the Incident Response Working Group. From a product quality perspective though we also don't have much visibility into how long this RowSetFinishing::finish currently takes.

Tips for reviewers

Hide whitespace

Checklist

@ParkMyCar ParkMyCar requested review from a team as code owners July 12, 2024 21:33
@ParkMyCar ParkMyCar requested a review from maddyblue July 12, 2024 21:33
}

pub(crate) fn row_set_finishing_seconds(&self) -> Histogram {
self.row_set_finishing_seconds.with_label_values(&[])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to propose we add a label here for sorted=true|false. Then I read the code and it looks like we always sort results even if a user didn't specify an ORDER BY? Although in that case it seems that no actual row movement would occur, but we'd still have to do O(something) to verify that. Is this correct? What's the something in those cases? If we are returning 1M+ rows, does the something add up even if there's no swaps?

If something ends up being slowish, it's worth considering delaying this PR so that can get fixed, then we can add a sorted label here.

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I realized recently when chatting with Frank and Moritz about where peek responses spend time. You're right, we do always sort rows even without an ORDER BY, what happens then is the tiebreaker gets called, which in this case is is just a comparison of the byte length of a row. We thought about not doing it but having rows come back in a deterministic way is nice especially for tests. In either case though we don't do any swaps in memory, we create a sorted "view" of the rows by sorting a Vec<usize>, it turns out for 1M+ rows this can actually be quite slow.

We chatted about this in Slack, the proposed fix is doing the sorting in clusterd and then environmentd can do a streaming merge. This is another reason I wanted to add this metric, so we can more easily measure any improvements we try to make

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counterpoint against determinism at this level: having the rows returned in a deterministic order can sometimes be bad for users who don't know that, by the spec, the if there's no ORDER BY we can return in any order. They can sometimes rely on an observed order in their application code, then if we change the implementation they get bit. Go, for example, does runtime randomization of map iteration order to prevent programs from relying on something out-of-spec.

If our tests need to rely on an order, they should be using the in-spec behavior using ORDER BY and not using out-of-spec behavior.

I'm ok merging is as, but think it's bad to have things, even tests, depend on out-of-spec behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point, I filed https://github.com/MaterializeInc/materialize/issues/28261 to track and put it on the agenda for our team's weekly

}

pub(crate) fn row_set_finishing_seconds(&self) -> Histogram {
self.row_set_finishing_seconds.with_label_values(&[])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counterpoint against determinism at this level: having the rows returned in a deterministic order can sometimes be bad for users who don't know that, by the spec, the if there's no ORDER BY we can return in any order. They can sometimes rely on an observed order in their application code, then if we change the implementation they get bit. Go, for example, does runtime randomization of map iteration order to prevent programs from relying on something out-of-spec.

If our tests need to rely on an order, they should be using the in-spec behavior using ORDER BY and not using out-of-spec behavior.

I'm ok merging is as, but think it's bad to have things, even tests, depend on out-of-spec behavior.

@ParkMyCar ParkMyCar merged commit c3fe6c4 into MaterializeInc:main Jul 15, 2024
77 checks passed
@ParkMyCar ParkMyCar deleted the adapter/instrument-row-set-finishing branch October 16, 2024 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants