-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enh]: Spark Expr missing methods #1714
Comments
Hey @FBruzzesi , Working on implementing scalar methods like Planning on working on the following methods - want to first check if my thought process is "correct".
Thinking of implementing two patterns for these methods: # if predicate-based (e.g. drop_nulls, which uses predicate function `F.isnull`)
def method(self) -> Self:
def _method(_input: Column) -> Column:
from pyspark.sql import functions as F # noqa: N812
return F.explode(F.filter(F.array(_input), <predicate_func>))
return self._from_call(_method, "method", returns_scalar=False)
# if not predicate-based (e.g. unique, which uses array function `F.array_distinct`)
def method(self) -> Self:
def _method(_input: Column) -> Column:
from pyspark.sql import functions as F # noqa: N812
return F.explode(<array_func>(F.array(_input)))
return self._from_call(_method, "method", returns_scalar=False) Not sure how expensive doing this is or if it collides with future API developments. Lmk what you think |
thanks @lucas-nelson-uiuc for your efforts here can we leave the row-order dependent ones out for now, make sure we've got everything done from the others first? there's some broader api decisions we need to make for those |
got a working version for the following - all supports the Polars examples and
|
Amazing stuff @lucas-nelson-uiuc ! Looking forward to those as well! if "pyspark" in str(constructor):
request.applymarker(pytest.mark.xfail) |
FYI I am working on |
tried adding
lmk if I'm missing something |
Methods with one asterisk (*) are row order dependent and should be deprioritized for now, until a decision for the lazy api is reached (see stable v2 discussion).
Methods with two asterisk (**) denote a namespace - namespace methods are not included, and they are all missing as of now.
High priority:
Deprioritized:
Namespaces:
The text was updated successfully, but these errors were encountered: