-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adapter: Record the amount of time it takes to run RowSetFinishing #28215
Merged
ParkMyCar
merged 2 commits into
MaterializeInc:main
from
ParkMyCar:adapter/instrument-row-set-finishing
Jul 15, 2024
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to propose we add a label here for
sorted=true|false
. Then I read the code and it looks like we always sort results even if a user didn't specify anORDER BY
? Although in that case it seems that no actual row movement would occur, but we'd still have to doO(something)
to verify that. Is this correct? What's thesomething
in those cases? If we are returning 1M+ rows, does the something add up even if there's no swaps?If
something
ends up being slowish, it's worth considering delaying this PR so that can get fixed, then we can add a sorted label here.Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I realized recently when chatting with Frank and Moritz about where peek responses spend time. You're right, we do always sort rows even without an
ORDER BY
, what happens then is the tiebreaker gets called, which in this case is is just a comparison of the byte length of a row. We thought about not doing it but having rows come back in a deterministic way is nice especially for tests. In either case though we don't do any swaps in memory, we create a sorted "view" of the rows by sorting aVec<usize>
, it turns out for 1M+ rows this can actually be quite slow.We chatted about this in Slack, the proposed fix is doing the sorting in
clusterd
and thenenvironmentd
can do a streaming merge. This is another reason I wanted to add this metric, so we can more easily measure any improvements we try to makeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counterpoint against determinism at this level: having the rows returned in a deterministic order can sometimes be bad for users who don't know that, by the spec, the if there's no ORDER BY we can return in any order. They can sometimes rely on an observed order in their application code, then if we change the implementation they get bit. Go, for example, does runtime randomization of map iteration order to prevent programs from relying on something out-of-spec.
If our tests need to rely on an order, they should be using the in-spec behavior using ORDER BY and not using out-of-spec behavior.
I'm ok merging is as, but think it's bad to have things, even tests, depend on out-of-spec behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great point, I filed https://github.com/MaterializeInc/materialize/issues/28261 to track and put it on the agenda for our team's weekly