-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] using python -m cudf.pandas
and calling hasattr
converts NA to NaN
#17666
Comments
python -m cudf.pandas
and using hasattr
converts NA to NaNpython -m cudf.pandas
and calling hasattr
converts NA to NaN
Thanks for the report. Possibly more simply, once the In [1]: %load_ext cudf.pandas
In [2]: import pandas as pd
...: df = pd.DataFrame({
...: "a": ["a", "a", "b", "b", "b"],
...: "b": [1, 2, None, 5, 3],
...: "c": [5, 4, 3, 2, 1],
...: })
In [3]: df
Out[3]:
a b c
0 a 1.0 5
1 a 2.0 4
2 b <NA> 3
3 b 5.0 2
4 b 3.0 1
In [4]: df._fsproxy_slow
Out[4]:
a b c
0 a 1.0 5
1 a 2.0 4
2 b NaN 3
3 b 5.0 2
4 b 3.0 1
In [5]: df
Out[5]:
a b c
0 a 1.0 5
1 a 2.0 4
2 b NaN 3
3 b 5.0 2
4 b 3.0 1 |
I'll check tomorrow, but I think it was actually affecting results (e.g. |
Yup, here's a repro which better demonstrates the issue: src = """
import pandas as pd
df = pd.DataFrame({
"a": ["a", "a", "b", "b", "b"],
"b": [1, 2, None, 5, 3],
"c": [5, 4, 3, 2, 1],
})
print(df)
print(df.groupby('a')['b'].cumsum())
print(hasattr(df, 'foobar'))
print(df)
print(df.groupby('a')['b'].cumsum())
"""
with open('f.py', 'w', encoding='utf-8') as fd:
fd.write(src) The output is
So, we go from
to
|
Ah OK thanks for the additional repo. I think when In [2]: import cudf
In [3]: cudf.DataFrame([1, None]).dtypes
Out[3]:
0 int64
dtype: object
In [4]: cudf.DataFrame.from_pandas(cudf.DataFrame([1, None]).to_pandas()).dtypes
Out[4]:
0 float64
dtype: object |
This is because of the nan_as_null parameter, that is present during the round-trip. I'm working on a fix. |
sure, thanks No objections fixing it like this, but I think falling back to pandas after a simple Falling back to pandas just for the sake of raising an error message (which gets discarded by EDIT: i've made a separate issue about this: #17678 |
Fixes: #17666 This PR ensures we convert all nulls to nan's in float columns only in pandas compatibility mode. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Matthew Murray (https://github.com/Matt711) URL: #17677
Describe the bug
Here's a complete reproduction: https://colab.research.google.com/drive/1E2bWuCZhuMK_t_aevsWQhbUysSF8hsHt?usp=sharing
If I then run
then I get
Spotted in Narwhals
Expected behavior
using
hasattr
should not change the contents of the dataframeThe text was updated successfully, but these errors were encountered: