-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: pyarrow unique
in group_by
context
#1076
base: main
Are you sure you want to change the base?
Conversation
thanks @FBruzzesi ! I think plotly would only need to get some value from the aggregation, rather than a list dtypes? perhaps we could allow
Something like this could help address the |
Not sure if this is the right place for this discussion but here we go π
Yes correct!
Not the biggest fan of this if we are going to support
Correct again. |
yeah but it returns object dtype and I fear that'd create more issues for us down the line |
Yes that's not ideal, and yesterday I had issues converting to list type (e.g. Maybe let's sleep on this, but I would imagine that someone using narwhals should just be a bit more pedantic and do: (df
.group_by("a")
.agg(nw.col("b").unique()))
.with_columns(nw.col("b").cast(nw.List(...))) # force it to be list type
... # now can access .list namespace
) |
i'm not sure that people would think to do that explicit cast, and implementing the list namespace would be quite difficult for pandas we may be able to take inspiration from duckdb here, who have >>> rel = duckdb.read_parquet('../scratch/assets.parquet')
>>> duckdb.sql('select symbol, any_value(date) from rel group by symbol')
βββββββββββ¬ββββββββββββββββββ
β symbol β any_value(date) β
β varchar β date β
βββββββββββΌββββββββββββββββββ€
β EWJ β 2022-01-31 β
β OGN β 2022-01-31 β
β PRU β 2022-01-31 β
β AEP β 2022-01-31 β
β ALLE β 2022-01-31 β
β IEFM.L β 2022-01-31 β
β EWG β 2022-01-31 β
β SEGA.L β 2022-01-31 β
β IAU β 2022-01-31 β
β XLV β 2022-01-31 β
β Β· β Β· β
β Β· β Β· β
β Β· β Β· β
β CNC β 2022-01-31 β
β CTAS β 2022-01-31 β
β DG β 2022-01-31 β
β IEF β 2022-05-31 β
β IEMG β 2022-01-31 β
β JPEA.L β 2022-01-31 β
β META β 2022-01-31 β
β HIGH.L β 2022-03-17 β
β HST β 2022-01-31 β
β VXX β 2022-01-31 β
βββββββββββ΄ββββββββββββββββββ€
β 100 rows (20 shown) β
βββββββββββββββββββββββββββββ So, my thinking was that If it's a top-level function ( |
Just for clarity, when you say:
does it mean that |
I haven't tried implementing it yet, but yes, I think so alternatively, we could add:
Alternatively, we could have our own
|
What type of PR is this? (check all applicable)
Related issues
May come in handy for plotly
Checklist
If you have comments or can explain your changes, please do so below.
I was not able to add tests... I tried to nest a bunch of checks but also the order inside the list type is not guaranteed..
any idea?