[FEA] Don't fallback to pandas after simpe `hasattr` check #17678

MarcoGorelli · 2025-01-03T09:31:05Z

Discovered in Narwhals

If I have a function like:

def process_df(df):
    if hasattr(df, 'with_columns'):
        # Do something. cudf/pandas wouldn't go here
        return df.with_columns(...)
    return df.groupby('a')['b'].sum()

and run it with cudf.pandas, then this is enough to cause a fallback to pandas - even though cudf would have been perfectly capable of executing it!

This isn't farfetched - you can find examples of libraries doing hasattr checks all over the place, e.g. this one in pymc

https://github.com/pymc-devs/pymc/blob/ae4a6292aa48d5722dba6ab422e1a3d895ca7bf7/pymc/data.py#L232-L248

Scikit-learn:

https://github.com/scikit-learn/scikit-learn/blob/c9aeb15f8f1c7c54ed4ef27c871f7167e2ce3077/sklearn/feature_selection/_base.py#L129-L131

Plotly:

https://github.com/plotly/plotly.py/blob/231edaa61f8590f715134d42ea0bc2f858dd713e/packages/python/plotly/plotly/express/_imshow.py#L303-L312

I asked about this on Slack, and got the response

I think we still need to fall back because we need to raise the same AttributeError that pandas would raise if it didn’t exist.

As a user, I consider falling back to pandas unnecessarily to be significantly worse than raising a slightly inconsistent AttributeError message - especially when using hasattr which is when the AttributeError would be caught anyway.

If exception raising needs to be perfectly consistent, perhaps there's a way to still raise it without needing to fallback to pandas? e.g.

def __getattr(self, attr):
    if hasattr(pandas.DataFrame, attr):
        # do the fallback, raise if necessary
    else:
        msg = f"{attr} is present in neither cuDF nor pandas"
        raise AttributeError(msg)

The text was updated successfully, but these errors were encountered:

mroeschke · 2025-01-03T19:59:23Z

Just wanted to comment that this is an interesting class of optimization that I think would be nice for cudf.pandas - Can cudf.pandas not fall back if it's an exception that would raise in pandas too? Like this hasattr check, other cases might be:

Calling methods that don't exist in either library.
Invalid arguments
Known invalid operations between two types (e.g. binops between strings and lists)

MarcoGorelli added the feature request New feature or request label Jan 3, 2025

MarcoGorelli mentioned this issue Jan 3, 2025

[BUG] using python -m cudf.pandas and calling hasattr converts NA to NaN #17666

Open

mroeschke added the cudf.pandas Issues specific to cudf.pandas label Jan 3, 2025

github-project-automation bot added this to cuDF Python Jan 3, 2025

github-project-automation bot moved this to Todo in cuDF Python Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Don't fallback to pandas after simpe `hasattr` check #17678

[FEA] Don't fallback to pandas after simpe `hasattr` check #17678

MarcoGorelli commented Jan 3, 2025 •

edited

Loading

mroeschke commented Jan 3, 2025 •

edited

Loading

[FEA] Don't fallback to pandas after simpe hasattr check #17678

[FEA] Don't fallback to pandas after simpe hasattr check #17678

Comments

MarcoGorelli commented Jan 3, 2025 • edited Loading

mroeschke commented Jan 3, 2025 • edited Loading

[FEA] Don't fallback to pandas after simpe `hasattr` check #17678

[FEA] Don't fallback to pandas after simpe `hasattr` check #17678

MarcoGorelli commented Jan 3, 2025 •

edited

Loading

mroeschke commented Jan 3, 2025 •

edited

Loading