Skip to content

Commit

Permalink
GH-45254: [C++][Acero] Fix the row offset truncation in row table mer…
Browse files Browse the repository at this point in the history
…ge (#45255)

### Rationale for this change

See #45254

### What changes are included in this PR?

First modify the test case to expose the suspecting bug.

Then the fix in source.

### Are these changes tested?

By existing tests.

### Are there any user-facing changes?

None.

* GitHub Issue: #45254

Authored-by: Rossi Sun <[email protected]>
Signed-off-by: Rossi Sun <[email protected]>
  • Loading branch information
zanmato1984 authored Jan 14, 2025
1 parent ef00568 commit ea47172
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 3 deletions.
4 changes: 3 additions & 1 deletion cpp/src/arrow/acero/hash_join_node_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3370,8 +3370,10 @@ TEST(HashJoin, LARGE_MEMORY_TEST(BuildSideOver4GBVarLength)) {
constexpr int value_no_match_length_min = 128;
constexpr int value_no_match_length_max = 129;
constexpr int value_match_length = 130;
// The value "DDD..." will be hashed to the partition over 4GB of the hash table.
// Matching at this area gives us more coverage.
const auto value_match =
std::make_shared<StringScalar>(std::string(value_match_length, 'X'));
std::make_shared<StringScalar>(std::string(value_match_length, 'D'));
constexpr int16_t num_rows_per_batch_left = 128;
constexpr int16_t num_rows_per_batch_right = 4096;
const int64_t num_batches_left = 8;
Expand Down
4 changes: 2 additions & 2 deletions cpp/src/arrow/acero/swiss_join.cc
Original file line number Diff line number Diff line change
Expand Up @@ -439,11 +439,11 @@ Status RowArrayMerge::PrepareForMerge(RowArray* target,
num_rows = 0;
num_bytes = 0;
for (size_t i = 0; i < sources.size(); ++i) {
target->rows_.mutable_offsets()[num_rows] = static_cast<uint32_t>(num_bytes);
target->rows_.mutable_offsets()[num_rows] = num_bytes;
num_rows += sources[i]->rows_.length();
num_bytes += sources[i]->rows_.offsets()[sources[i]->rows_.length()];
}
target->rows_.mutable_offsets()[num_rows] = static_cast<uint32_t>(num_bytes);
target->rows_.mutable_offsets()[num_rows] = num_bytes;
}

return Status::OK();
Expand Down

0 comments on commit ea47172

Please sign in to comment.