-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert cudf.Scalar usage to pylibcudf and pyarrow usage #17686
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some questions/suggestions, but I'm approving because overall the changes look good to me. And you said you plan to follow-up this PR to handle more scalar conversion cases.
@@ -36,7 +37,7 @@ def gather( | |||
|
|||
@acquire_spill_lock() | |||
def scatter( | |||
sources: list[ColumnBase | cudf.Scalar], | |||
sources: list[ColumnBase | ScalarLike], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? I think ScalarLike
is an alias for Any
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it remains ScalarLike
, should we handle the different scalars we can get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I changed this when trying to going down the rabbit hole of using pylibcudf.Scalar
here but it became more involved that I had thought. This was a leftover I forgot to revert
@@ -3255,7 +3255,7 @@ def duplicated( | |||
) | |||
distinct = libcudf.column.Column.from_pylibcudf(plc_column) | |||
result = copying.scatter( | |||
[cudf.Scalar(False, dtype=bool)], | |||
[cudf.Scalar(False)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can make this plc.Scalar
. And then fix scatter
's implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I'll plan to tackle this in a follow up
/merge |
I realized in #17686 that this refactoring might have sidestepped the caching benefits in the prior `cudf.Scalar` usage. This PR introduces `pa_scalar_to_plc_scalar` which caches the pyarrow to pylibcudf conversion in that PR. Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17707
Description
A lot of
cudf.Scalar
usage is to eventually end up with a device scalar object (pylibcudf.Scalar
) to pass to a pylibcudf routine. The conversion logic to get there can be achieved by converting to apyarrow.Scalar
and usingpylibcudf.interop.from_arrow
. This way we offload a lot of scalar-conversion-logic incudf.Scalar
topyarrow.Scalar
which can further be converted using the interop method.This PR just tackles some straightforward cases of the above. Another PR will tackle the more involved cases
Checklist