-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43953: [C++] Add tests based on random data and benchmarks to ChunkResolver::ResolveMany #43954
Conversation
|
Results here (AMD Zen 2 CPU, gcc):
|
Ironically, uint32 seems slightly slower than both uint16 and uint64. Not sure that's due to the compiler or to the CPU. |
|
||
template <typename IndexType> | ||
void ResolveManySetArgs(benchmark::internal::Benchmark* bench) { | ||
constexpr int32_t kNonAligned = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what this is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimizations I was experimenting with involved some unrolling so I didn't want input values to neatly align to powers of 2.
case 2: | ||
case 4: | ||
case 8: | ||
bench->Args({kChunkedArrayLength, /*num_chunks*/ 10000, kNumIndicesFew}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10000 chunks is really a lot and I'm not sure it's really useful to test different numbers of chunks. By accumulating different combinations of parameters we make the benchmark results less immediately readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The huge number is necessary (even though it's a bit unrealistic) to measure how effective the binary search is at reducing the search space (proportional to the number of chunks).
cpp/src/arrow/chunked_array.cc
Outdated
@@ -55,7 +55,7 @@ ChunkedArray::ChunkedArray(ArrayVector chunks, std::shared_ptr<DataType> type) | |||
<< "cannot construct ChunkedArray from empty vector and omitted type"; | |||
type_ = chunks_[0]->type(); | |||
} | |||
|
|||
ARROW_CHECK_LE(chunks.size(), std::numeric_limits<int>::max()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this limit useful if it ends up not making performance better anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is useful in ensuring we are not generating more chunks than can be addressed by the index type.
Same on Things are saner on the Apple M1 Pro with 32-bit being the fastest.
NOTE: I have a lot of stuff running on the M1 during this benchmark :) |
There are a number of CI failures that need fixing. |
TIL |
Yes, it's also quite dramatic for the actual Sort implementation :-) |
There are still a couple CI failures that need fixing. |
7aeaf3f
to
462be72
Compare
for consistency with the codebase style
There are still a couple compilation errors on CI it seems :-) |
@pitrou now it's all green. The only failure is unrelated. |
Just a question: do you mean to keep the 64-bit to 32-bit chunk index change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after a minor push. I'll merge if CI is green.
Yes I do. |
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 83f35de. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 16 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Improve tests and add benchmarks. I wrote the tests and benchmarks while trying to improve the performance of
ResolveMany
and failing at it.What changes are included in this PR?
Tests, benchmarks, and changes that don't really affect performance but might unlock more optimization opportunities in the future.
Are these changes tested?
Yes.