Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the ORC decoding bug for the timestamp data #17570

Merged

Conversation

kingcrimsontianyu
Copy link
Contributor

@kingcrimsontianyu kingcrimsontianyu commented Dec 10, 2024

Description

This PR introduces a band-aid class run_cache_manager to handle an exceptional case in TIMESTAMP data type, where the DATA stream (seconds) is processed ahead of SECONDARY stream (nanoseconds) and the excess rows are lost. The fix uses run_cache_manager (and also cache_helper, which is an implementation detail) to cache the potentially missed data from the DATA stream and let them be used in the next decoding iteration, thus preventing data loss.

Closes #17155

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@kingcrimsontianyu kingcrimsontianyu added bug Something isn't working non-breaking Non-breaking change labels Dec 10, 2024
@kingcrimsontianyu kingcrimsontianyu self-assigned this Dec 10, 2024
Copy link

copy-pr-bot bot commented Dec 10, 2024

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 10, 2024
@kingcrimsontianyu
Copy link
Contributor Author

/ok to test

@kingcrimsontianyu
Copy link
Contributor Author

/ok to test

@kingcrimsontianyu
Copy link
Contributor Author

/ok to test

@vuule vuule self-requested a review December 13, 2024 22:03
@kingcrimsontianyu
Copy link
Contributor Author

/ok to test

2 similar comments
@kingcrimsontianyu
Copy link
Contributor Author

/ok to test

@kingcrimsontianyu
Copy link
Contributor Author

/ok to test

@github-actions github-actions bot added the Python Affects Python cuDF API. label Dec 17, 2024
@kingcrimsontianyu kingcrimsontianyu marked this pull request as ready for review December 17, 2024 20:07
@kingcrimsontianyu kingcrimsontianyu requested review from a team as code owners December 17, 2024 20:07
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some old comment, not sure if applicable still

cpp/src/io/orc/stripe_data.cu Show resolved Hide resolved
cpp/src/io/orc/stripe_data.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/stripe_data.cu Outdated Show resolved Hide resolved
Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple small suggestions.

cpp/src/io/orc/stripe_data.cu Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_orc.py Outdated Show resolved Hide resolved
@ttnghia
Copy link
Contributor

ttnghia commented Dec 18, 2024

Should we run a benchmark on this patch to see how much performance impact it causes?

@vuule
Copy link
Contributor

vuule commented Dec 18, 2024

Should we run a benchmark on this patch to see how much performance impact it causes?

Are there Spark-RAPIDS benchmarks that we can (also) run to check the impact?

Copy link
Member

@mhaseeb123 mhaseeb123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. Overall looks good.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be on vacation for the next week and I don't want to block this PR, so I'm just leaving comments without requesting blocking changes. Feel free to ping me if you have thoughts though!

cpp/src/io/orc/stripe_data.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/stripe_data.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/stripe_data.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/stripe_data.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/stripe_data.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/stripe_data.cu Outdated Show resolved Hide resolved
__shared__ run_cache_manager run_cache_manager_inst;
cache_helper cache_helper_inst(run_cache_manager_inst);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make any changes to the shared memory allocation upon launching the kernel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of the shared memory needed won't change throughout the kernel execution, hence the static allocation. Does this answer your question?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong: I don't understand why the shared memory size of a kernel does not change when we add a new shared memory object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has changed, but we don't need to declare this explicitly; size of the shared memory is known at compile time, so CUDA takes care of this for us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this makes sense. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_cache_manager increases the shared memory usage per block by 12 bytes. Quite negligible in comparison to the existing usage by orcdec_state_s, which takes 37,776 bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ttnghia Yeah that's a lot of shared memory to use. The static arrays in orcdec_state_s such as the intermediate "byte streams" and the intermediate decoded output are the biggest contributors. This is required by the current design where each block has a hardcoded number of 1024 threads to be able to consume two 512-length runs at a time. We may improve this part in the future when needed.

@kingcrimsontianyu
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 4e97cd4 into rapidsai:branch-25.02 Jan 7, 2025
116 of 117 checks passed
Copy link
Member

@mhaseeb123 mhaseeb123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Status: No status
Development

Successfully merging this pull request may close these issues.

[BUG] Misaligned timestamps produced by ORC reader
6 participants