-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove "legacy" Dask DataFrame support from Dask cuDF #17558
Remove "legacy" Dask DataFrame support from Dask cuDF #17558
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
…to remove-legacy-dataframe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Packaging changes look good to me. Will defer to others on the substance of removing this support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
"The legacy DataFrame API is not supported in dask_cudf>24.12. " | ||
"Please enable query-planning, or downgrade to dask_cudf<=24.12" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The legacy DataFrame API is not supported in dask_cudf>24.12. " | |
"Please enable query-planning, or downgrade to dask_cudf<=24.12" | |
"The legacy DataFrame API is not supported in dask_cudf>24.12." | |
"Please enable query-planning, or downgrade to dask_cudf<=24.12" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want a space between the period and the first word of the next sentence?
Update: This is blocked by rapidsai/dask-cuda#1417 (which is now ready for review) |
…to remove-legacy-dataframe
Update: This PR now removes 2,000+ lines of code that would be essentially "dead" outside the legacy API. |
/okay to test |
/okay to test |
Removes testing/handling for "legacy" Dask cuDF (i.e. `DASK_DATAFRAME__QUERY_PLANNING=False`). This PR also adds support for the `"explicit-comms"` config with query-planning enabled (we used to raise an error telling the user to disable query planning). This should be merged **before** rapidsai/cudf#17558 (otherwise Dask-CUDA CI will break). This PR is marked as "breaking", because it technically breaks the `"explicit-comms"` config with the "legacy" version of Dask cuDF (which we are about to remove in 25.02 anyway). Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - James Lamb (https://github.com/jameslamb) - Mads R. B. Kristensen (https://github.com/madsbk) URL: #1417
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like python/dask_cudf/dask_cudf/io/tests/test_csv.py
needs a copyright update otherwise LGTM.
(I think after the dask-cuda update to remove legacy dask, this PR will be necessary to unblock other PRs? e.g. https://github.com/rapidsai/cudf/actions/runs/12643976310/job/35232330877?pr=17686)
Ah! I can't believe I forgot that we run distributed tests in dask_cudf - Yeah, we can get this in soon then and I can do some cleanup in a follow-up PR (also need some cleanup in dask-cuda before we can unpin Dask). |
/merge |
Follow up to #17558 This PR cleans up some imports and provides support for both `dask:2024.12.1` and `dask:main` (in which `dask_expr` has been moved into the `dask.dataframe` module). See also: rapidsai/dask-cuda#1424 Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Peter Andreas Entschev (https://github.com/pentschev) - Bradley Dice (https://github.com/bdice) URL: #17704
Description
The legacy Dask DataFrame API is deprecated. We should remove it for 25.02 to reduce maintenance burden.
Blockers:
Checklist