Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-43926: [C++] Compute: RowEncoder eliminates offsets when all columns are fixed-sized #43931

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mapleFU
Copy link
Member

@mapleFU mapleFU commented Sep 3, 2024

Rationale for this change

Not use offsets_ vector when all columns in RowEncoder is fixed-width.

This:

  1. Reducing the memory usage
  2. Reducing calling to AddLength

This might enlarge 8-byte memory in RowEncoder.

What changes are included in this PR?

Not use offsets_ vector when all columns in RowEncoder is fixed-width.

Are these changes tested?

Covered by existing

Are there any user-facing changes?

no

@mapleFU mapleFU requested a review from westonpace as a code owner September 3, 2024 03:34
Copy link

github-actions bot commented Sep 3, 2024

⚠️ GitHub issue #43926 has been automatically assigned in GitHub to PR creator.

@mapleFU mapleFU force-pushed the row-encoder-optimize-fixed-width branch from f8953d5 to cd30962 Compare September 3, 2024 03:35
@mapleFU mapleFU requested a review from zanmato1984 September 3, 2024 03:35
@mapleFU mapleFU force-pushed the row-encoder-optimize-fixed-width branch from cd30962 to 7deeb8c Compare September 3, 2024 03:43
@pitrou
Copy link
Member

pitrou commented Sep 3, 2024

What does performance look like?

@mapleFU
Copy link
Member Author

mapleFU commented Sep 3, 2024

CI need fix

D:\a\arrow\arrow\build\cpp\src\arrow\CMakeFiles\arrow_compute_shared.dir\Unity\unity_4_cxx.cxx
D:/a/arrow/arrow/cpp/src/arrow/compute/row/row_encoder_internal.cc(359): error C2220: the following warning is treated as an error
D:/a/arrow/arrow/cpp/src/arrow/compute/row/row_encoder_internal.cc(359): warning C4244: '+=': conversion from 'const int64_t' to 'int32_t', possible loss of data

Copy link
Contributor

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some preliminary comments, mostly nits.

cpp/src/arrow/compute/row/row_encoder_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/row/row_encoder_internal.h Outdated Show resolved Hide resolved
@@ -132,6 +147,8 @@ struct ARROW_EXPORT DictionaryKeyEncoder : FixedWidthKeyEncoder {
Result<std::shared_ptr<ArrayData>> Decode(uint8_t** encoded_bytes, int32_t length,
MemoryPool* pool) override;

// Uses `GetEncoderInfo` in `FixedWidthKeyEncoder`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this line for?

cpp/src/arrow/compute/row/row_encoder_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/row/row_encoder_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/row/row_encoder_internal.h Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 3, 2024
@@ -83,6 +83,13 @@ struct ARROW_EXPORT KeyEncoder {
static bool IsNull(const uint8_t* encoded_bytes) {
return encoded_bytes[0] == kNullByte;
}

struct EncoderInfo {
Copy link
Contributor

@zanmato1984 zanmato1984 Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using a more descriptive name - XxInfo generally doesn't give much information about Xx. Maybe something like EncodeColumnMeta?

@mapleFU
Copy link
Member Author

mapleFU commented Sep 6, 2024

I would go out for tour from 9.14 to 9.21. I'll focus on List Join these days, mark this as wip first

@mapleFU mapleFU marked this pull request as draft September 6, 2024 07:24
@mapleFU
Copy link
Member Author

mapleFU commented Sep 8, 2024

With:

struct BenchmarkSettings {
  int num_threads = 1;
  JoinType join_type = JoinType::INNER;
  // Change to 'true' to benchmark alternative, non-default and less optimized version of
  // a hash join node implementation.
  bool use_basic_implementation = true;

After:

BM_HashJoinBasic_KeyTypes/"{int32}"/Hashtable krows:1        166477 ns       166453 ns         4154 rows/sec=6.15188M/s
BM_HashJoinBasic_KeyTypes/"{int32}"/Hashtable krows:8       1439597 ns      1439500 ns          487 rows/sec=5.69086M/s
BM_HashJoinBasic_KeyTypes/"{int32}"/Hashtable krows:64     13310182 ns     13310009 ns           54 rows/sec=4.92381M/s
BM_HashJoinBasic_KeyTypes/"{int32}"/Hashtable krows:512   354529054 ns    354520400 ns            2 rows/sec=1.47887M/s
BM_HashJoinBasic_KeyTypes/"{int32}"/Hashtable krows:4096 6766162518 ns   6763357800 ns            1 rows/sec=620.151k/s
BM_HashJoinBasic_KeyTypes/"{utf8}"/Hashtable krows:1         191892 ns       191763 ns         3584 rows/sec=5.33993M/s
BM_HashJoinBasic_KeyTypes/"{utf8}"/Hashtable krows:4         791714 ns       791462 ns          875 rows/sec=5.17523M/s
BM_HashJoinBasic_KeyTypes/"{utf8}"/Hashtable krows:16       3627607 ns      3627186 ns          196 rows/sec=4.517M/s
BM_HashJoinBasic_KeyTypes/"{utf8}"/Hashtable krows:64      22743811 ns     22741736 ns           33 rows/sec=2.88175M/s

Before:

BM_HashJoinBasic_KeyTypes/"{int32}"/Hashtable krows:1        162843 ns       162781 ns         4207 rows/sec=6.29067M/s
BM_HashJoinBasic_KeyTypes/"{int32}"/Hashtable krows:8       1421683 ns      1421186 ns          493 rows/sec=5.7642M/s
BM_HashJoinBasic_KeyTypes/"{int32}"/Hashtable krows:64     13439032 ns     13438304 ns           52 rows/sec=4.87681M/s
BM_HashJoinBasic_KeyTypes/"{int32}"/Hashtable krows:512   408234871 ns    408206400 ns            2 rows/sec=1.28437M/s
BM_HashJoinBasic_KeyTypes/"{int32}"/Hashtable krows:4096 6854617796 ns   6852387200 ns            1 rows/sec=612.094k/s
BM_HashJoinBasic_KeyTypes/"{utf8}"/Hashtable krows:1         188940 ns       188805 ns         3716 rows/sec=5.42359M/s
BM_HashJoinBasic_KeyTypes/"{utf8}"/Hashtable krows:4         759144 ns       758925 ns          918 rows/sec=5.39711M/s
BM_HashJoinBasic_KeyTypes/"{utf8}"/Hashtable krows:16       3477411 ns      3476909 ns          202 rows/sec=4.71223M/s
BM_HashJoinBasic_KeyTypes/"{utf8}"/Hashtable krows:64      20586176 ns     20585156 ns           34 rows/sec=3.18365M/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants