-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: modin dtype interoperability #1692
fix: modin dtype interoperability #1692
Conversation
- preserve dtype backend when casting on modin natives - groupby backends handle own __iter__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no better feeling than seeing a failing CI job and then reading
[XPASS(strict)]
π well done! this is awesome
indices = self._grouped.indices | ||
if ( | ||
self._df._implementation is Implementation.PANDAS | ||
and self._df._backend_version < (2, 2) | ||
) or ( | ||
self._df._implementation is Implementation.CUDF | ||
and self._df._backend_version < (2024, 12) | ||
): # pragma: no cover | ||
for key in indices: | ||
yield (key, self._from_native_frame(self._grouped.get_group(key))) | ||
else: | ||
for key in indices: | ||
key = tupleify(key) # noqa: PLW2901 | ||
yield (key, self._from_native_frame(self._grouped.get_group(key))) | ||
for key, group in self._grouped: | ||
yield (key, self._from_native_frame(group)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
legend, thanks so much @camriddell !
i've got something in progress for #1690 which should hopefully improve things for all these backends in tests
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
Narwhals failed to preserve pyarrow backed datatypes with Modin-native backed DataFrames. This issue was first noticed in the aforementioned PR and subsequent MRE
With this PR, we appropriately preserve the dtype back-end
An edge case was encountered in
narwhals/_pandas_like/group_by.py
where.get_group(...)
would raise a KeyError if the passed key containedfloat("nan")
(as it does in the testing suite). This is likely due to some copying/reconstruction of the nan object which fails to reproduce its original hash since its hash is compute against the object id.