-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jan 18, 2025: This week(s) in DataFusion #14179
Comments
@2010YOUY01 became a committer: https://lists.apache.org/thread/b7df0bpzyzzcg6ph50swx7jw0b5dks75 🎉 |
@mbrobbel has a nice proposal to add extension types in arrow-rs (which would potentially help ExtensionTypes in DataFusion). Would apprecaite any feedback: |
EPIC EPIC work from @logan-keede @berkaysynnada and @buraksenn hacking out the physical-optimizer into its own crate (finally!) |
@edmondop mentions
|
This is a neat "zero lake" idea (DataFusion via WASM in browser) https://gh-sparkling-cherry-6975.fly.dev/ |
@mertak-synnada has a PR that nicely refactors the data source code but may be a non trivial downstream API change |
This is a pretty neat looking dataframe library built on DataFusion: |
If anyone else is interested in helping build times, @waynexia is starting to organize a project: |
Does this mean that some joins and window clauses now get pushed down to the TableProvider? |
Introduction
This ticket is my weekly-ish summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please leave comments on this ticket about things that I may have missed or you think should get wider attention by the community. Follow on to #13970
Reminder, find new content (and please post some!) to Concepts, Readings, Events page
Community Highlights
DISCUSSION: January 2025 DataFusion Meetup in Amsterdam / CIDR 2025 #12988
Releases!
DataFusion 44 is released (and the upgrade to delta was very smooth chore: upgrade to datafusion 43 delta-io/delta-rs#2886 )
DataFuson python 44 released: chore: Upgrade to DataFusion 44 datafusion-python#972
We are starting to work on Release DataFusion
45.0.0
#14008 (we'll likely start testing in earnest the week of Jan 24Arrow minor release in progress Release arrow-rs / parquet minor version 54.1.0 (Jan 2025) arrow-rs#6929
Also, @Owen-CH-Leung updated DataFusion to Arrow 54 🚀 Upgrade arrow-rs, parquet to
54.0.0
and pyo3 to0.23.3
#14153Performance
DataFusion's core value proposition is great performance without having to re-implement it yourself
find_in_set
function #14020 Improve perfomance ofreverse
function #14025PARTITION BY
window clauses feat(optimizer): Enable filter pushdown on window functions #14026 (🙌 )Quality
sqlite
test suiteBug Fixes
DataFusion is in the "we are finding all the corner case bugs now" phase of its life and people are now bashing them down
arrow-distinct
now work with null rows #13966 @rluvatonUNION
andORDER BY
queries #13748, getting the sort code in good shape while I prepare to optimize it: Encapsulate fields ofEquivalenceProperties
#14040nth_value
whenignoreNulls
is true and no nulls in values #14042 and Handle empty rows forarray_distinct
#13810Cleanups 🧹
Now that we have a large useful codebase it is also important to keep it neat and tidy so we spend a non trivial time there too.
StringArrayType
#14023 🧹sql_select_to_rex()
#14088SanityChecker
intophysical-optimizer
crate #14083 and MoveJoinSelection
intodatafusion-physical-optimizer
crate #14073Features
Inline documentaton macros
Substrait!
External Sort (aka really large memory) improvements
@2010YOUY01 and @Lordworms are beginning to work on improving out of core sorting. There are several great PRs up and outstanding:
Dev Containers
Also, thanks to
LogicalPlan::DML(...)
serde #14079 goingLooking to get more involved? Please help review code! 🎣
DataFusion has a long history of community members contributing in all aspects of the project. Reviewing PRs is an especially great way to get introduced to the project, help the community and grow your own knowledge -- researching and understanding the code enough to review PRs also often inspires additional ideas for improvements.
We have docs about reviews. TLDR is: look for test coverage, if the change is understandable and well documented, and if the code can be improved. When you think the PR looks good to merge, try
@
mentioning one of the committers.Help wanted
Please feel leave your own comments on this ticket if you are looking for help
Community
Upcoming meetups:
The text was updated successfully, but these errors were encountered: