Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Don't fallback to pandas after simpe hasattr check #17678

Open
MarcoGorelli opened this issue Jan 3, 2025 · 1 comment
Open

[FEA] Don't fallback to pandas after simpe hasattr check #17678

MarcoGorelli opened this issue Jan 3, 2025 · 1 comment
Labels
cudf.pandas Issues specific to cudf.pandas feature request New feature or request

Comments

@MarcoGorelli
Copy link
Contributor

MarcoGorelli commented Jan 3, 2025

Discovered in Narwhals

If I have a function like:

def process_df(df):
    if hasattr(df, 'with_columns'):
        # Do something. cudf/pandas wouldn't go here
        return df.with_columns(...)
    return df.groupby('a')['b'].sum()

and run it with cudf.pandas, then this is enough to cause a fallback to pandas - even though cudf would have been perfectly capable of executing it!

This isn't farfetched - you can find examples of libraries doing hasattr checks all over the place, e.g. this one in pymc

https://github.com/pymc-devs/pymc/blob/ae4a6292aa48d5722dba6ab422e1a3d895ca7bf7/pymc/data.py#L232-L248

Scikit-learn:

https://github.com/scikit-learn/scikit-learn/blob/c9aeb15f8f1c7c54ed4ef27c871f7167e2ce3077/sklearn/feature_selection/_base.py#L129-L131

Plotly:

https://github.com/plotly/plotly.py/blob/231edaa61f8590f715134d42ea0bc2f858dd713e/packages/python/plotly/plotly/express/_imshow.py#L303-L312


I asked about this on Slack, and got the response

I think we still need to fall back because we need to raise the same AttributeError that pandas would raise if it didn’t exist.

As a user, I consider falling back to pandas unnecessarily to be significantly worse than raising a slightly inconsistent AttributeError message - especially when using hasattr which is when the AttributeError would be caught anyway.

If exception raising needs to be perfectly consistent, perhaps there's a way to still raise it without needing to fallback to pandas? e.g.

def __getattr(self, attr):
    if hasattr(pandas.DataFrame, attr):
        # do the fallback, raise if necessary
    else:
        msg = f"{attr} is present in neither cuDF nor pandas"
        raise AttributeError(msg)
@mroeschke
Copy link
Contributor

mroeschke commented Jan 3, 2025

Just wanted to comment that this is an interesting class of optimization that I think would be nice for cudf.pandas - Can cudf.pandas not fall back if it's an exception that would raise in pandas too? Like this hasattr check, other cases might be:

  • Calling methods that don't exist in either library.
  • Invalid arguments
  • Known invalid operations between two types (e.g. binops between strings and lists)

@mroeschke mroeschke added the cudf.pandas Issues specific to cudf.pandas label Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf.pandas Issues specific to cudf.pandas feature request New feature or request
Projects
Status: Todo
Development

No branches or pull requests

2 participants