You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Instead of using dask delayed to align and map over the partitions of our catalogs, we could try to use the ddf.partitions accessor to align the partitions as necessary and map_partitions over them. There are still questions over how we deal with empty partitions and divisions, but may be an approach to look into.
The text was updated successfully, but these errors were encountered:
So I looked into this a while back. I switched out our join and crossmatch implementations to use map_partitions instead of delayed and saw some improvement in lazy operation time, but not orders of magnitude improvements.
Advantages:
faster lazy operations
if dask allows specifying necessary columns in map partitions, we can get the dask expressions query optimization to work through our functions
Disadvantages:
I think the code is less readable and less well organized than with delayed
Instead of using dask delayed to align and map over the partitions of our catalogs, we could try to use the
ddf.partitions
accessor to align the partitions as necessary and map_partitions over them. There are still questions over how we deal with empty partitions and divisions, but may be an approach to look into.The text was updated successfully, but these errors were encountered: