-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adapter: Record the amount of time it takes to run RowSetFinishing #28215
adapter: Record the amount of time it takes to run RowSetFinishing #28215
Conversation
} | ||
|
||
pub(crate) fn row_set_finishing_seconds(&self) -> Histogram { | ||
self.row_set_finishing_seconds.with_label_values(&[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to propose we add a label here for sorted=true|false
. Then I read the code and it looks like we always sort results even if a user didn't specify an ORDER BY
? Although in that case it seems that no actual row movement would occur, but we'd still have to do O(something)
to verify that. Is this correct? What's the something
in those cases? If we are returning 1M+ rows, does the something add up even if there's no swaps?
If something
ends up being slowish, it's worth considering delaying this PR so that can get fixed, then we can add a sorted label here.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I realized recently when chatting with Frank and Moritz about where peek responses spend time. You're right, we do always sort rows even without an ORDER BY
, what happens then is the tiebreaker gets called, which in this case is is just a comparison of the byte length of a row. We thought about not doing it but having rows come back in a deterministic way is nice especially for tests. In either case though we don't do any swaps in memory, we create a sorted "view" of the rows by sorting a Vec<usize>
, it turns out for 1M+ rows this can actually be quite slow.
We chatted about this in Slack, the proposed fix is doing the sorting in clusterd
and then environmentd
can do a streaming merge. This is another reason I wanted to add this metric, so we can more easily measure any improvements we try to make
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counterpoint against determinism at this level: having the rows returned in a deterministic order can sometimes be bad for users who don't know that, by the spec, the if there's no ORDER BY we can return in any order. They can sometimes rely on an observed order in their application code, then if we change the implementation they get bit. Go, for example, does runtime randomization of map iteration order to prevent programs from relying on something out-of-spec.
If our tests need to rely on an order, they should be using the in-spec behavior using ORDER BY and not using out-of-spec behavior.
I'm ok merging is as, but think it's bad to have things, even tests, depend on out-of-spec behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great point, I filed https://github.com/MaterializeInc/materialize/issues/28261 to track and put it on the agenda for our team's weekly
} | ||
|
||
pub(crate) fn row_set_finishing_seconds(&self) -> Histogram { | ||
self.row_set_finishing_seconds.with_label_values(&[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counterpoint against determinism at this level: having the rows returned in a deterministic order can sometimes be bad for users who don't know that, by the spec, the if there's no ORDER BY we can return in any order. They can sometimes rely on an observed order in their application code, then if we change the implementation they get bit. Go, for example, does runtime randomization of map iteration order to prevent programs from relying on something out-of-spec.
If our tests need to rely on an order, they should be using the in-spec behavior using ORDER BY and not using out-of-spec behavior.
I'm ok merging is as, but think it's bad to have things, even tests, depend on out-of-spec behavior.
This PR adds a Prometheus histogram to instrument
RowSetFinishing::finish
.It also adds a
SessionMetrics
struct which is a new field onSession
as a way to pass around this new metric.RowSetFinishing::finish
is not always run from the context that has access to aCoordinator
orAdapterClient
, addingSessionMetrics
seemed like the cleanest way to move this metric around.Motivation
This is an action item that came out of the Incident Response Working Group. From a product quality perspective though we also don't have much visibility into how long this
RowSetFinishing::finish
currently takes.Tips for reviewers
Hide whitespace
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.