-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enh]: add cov
to Expr
and Series
#1607
Comments
thanks @e10v for the request! and sure, a separate |
Thanks for your request @e10v ! At first sight, it seems quite hard to support pandas native |
@FBruzzesi thank you for quick response! Just in case there is a |
@MarcoGorelli thank you for accepting the request! It's surely not urgent, at least for me. So even low priority would be good, considering the possible complexity of a solution. I've also created a separate feature request #1610 for |
Not too pleasing, yet the following solution does not incur in complex aggregations hence it will work for pyarrow as well: _GRP_SIZE = "_group_size__"
data = data.with_columns(**{
_DEMEAN.format(col): _demean_nw_col(col, group_col)
for col in covar_cols
})
# Pre computes left * right
data = data.with_columns(**{
_COV.format(left, right): (
nw.col(_DEMEAN.format(left)) * nw.col(_DEMEAN.format(right))
)
for left, right in cov_cols
})
# Covariance: numerator only
cov_expr = {
_COV.format(left, right): nw.col(_COV.format(left, right)).sum()
for left, right in cov_cols
}
# Group size: needed for covariance denominator
group_size_expr = {_GRP_SIZE: nw.len()}
all_expr = count_expr | mean_expr | var_expr | cov_expr | group_size_expr
result = (
data
.group_by(group_col)
.agg(**all_expr)
.with_columns(**{
_COV.format(left, right): nw.col(_COV.format(left, right))/(nw.col(_GRP_SIZE) -1)
for left, right in cov_cols
})
) |
@FBruzzesi thanks! In my case, count and group size are the same. So, I can just reuse it. Will fix it in the next version of tea-tasting. |
Awesome! I cut some part of the code and focused only on the covariance while implementing this workaround. I apologize if I missed other parts that can be re-used. Feel free to ping me/us in the tea-tasting PR if needed 🚀 |
Looks like it works: e10v/tea-tasting#110 |
Great to hear that it worked out 🚀 |
We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?
https://github.com/e10v/tea-tasting
Please describe the purpose of the new feature or describe the problem to solve.
In tea-tasting, there is a need to calculate variance and covariance. There is a workaround: here and here. But, apparently, it's not optimal for pandas dataframes. Currently this warning is suppressed in tea-tasting.
I see that variance is already
in progressadded in #1603 ❤️ Could you please also consider adding covariance as well? It's not critical: tea-tasting works without it. But it would make calculations faster for users who prefer pandas-like dataframes.Not all dataframes have a covariance function. For example, pyarrow doesn't have it (though it's still possible to calculate it the same way as tea-tasting does). In this case it would be very useful to have a method similar to Ibis
has_operation
(example usage). Could you consider adding it as well? I can create a separate feature request if needed.Suggest a solution if possible.
No response
If you have tried alternatives, please describe them below.
No response
Additional information that may help us understand your needs.
No response
The text was updated successfully, but these errors were encountered: