-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use 64-bit offsets only if the current strings column output chunk size exceeds threshold #17693
Use 64-bit offsets only if the current strings column output chunk size exceeds threshold #17693
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
@binmahone please check if this PR resolves the NVBug you are encountering. |
if (sz <= strings::detail::get_offset64_threshold()) { | ||
// only if it is not a large strings column | ||
if (col_string_sizes[idx] <= | ||
static_cast<size_t>(strings::detail::get_offset64_threshold())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strings::detail::get_offset64_threshold()
is always <= std::numeric_limits<int32_t>::max()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true. The purpose is for testing (and requested by Spark) as way you could test with a lower threshold without needing to create a 2GB char array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep,I was trying to comment on why I changed to upcasting threshold
instead of downcasting col_string_sizes[idx]
as col_string_sizes
may be > 2B for large str cols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpp approval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mhaseeb123 Yesterday we verified it locally, it works! (many thanks!!!) But we need some more time to verify at customer side, as it requires some back-porting and re-packaging |
Thank you for update @binmahone. We will merge this PR once you confirm! |
hi @mhaseeb123 , after discussion with @GaryShen2008 , we suggest you to merge this PR before my next confirmation, as long as it meets your quality requirements. According to our latest brach model, we'll need this PR merged into 25.02-SNAPSHOT before we can do verification at customer side. We'll fire new issues in case it doesn't work out in customer env. |
/merge |
Description
This PR improves on #17207 and only uses 64-bit offsets if the current output chunk of a strings column exceeds the large-strings threshold instead of using cumulative strings column sizes per
pass
orrow group
level.Checklist