Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release DataFusion 44.0.0 #13334

Closed
7 of 10 tasks
alamb opened this issue Nov 10, 2024 · 54 comments
Closed
7 of 10 tasks

Release DataFusion 44.0.0 #13334

alamb opened this issue Nov 10, 2024 · 54 comments
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Nov 10, 2024

Is your feature request related to a problem or challenge?

Tracking ticket for next release, also a place to track desired inclusions

Last release was https://crates.io/crates/datafusion/43.0.0 November 8th, 2024 so next major release would be around December 8, 2024

Prior release tickets:

Items to possibly fix before release

Improved upgrade experience:

Downstream project testing:

@alamb alamb added the enhancement New feature or request label Nov 10, 2024
@alamb alamb mentioned this issue Nov 10, 2024
4 tasks
@alamb
Copy link
Contributor Author

alamb commented Nov 21, 2024

I would love to get this one in:

Some things to highlight (that are already merged)

@alamb
Copy link
Contributor Author

alamb commented Nov 21, 2024

@alamb
Copy link
Contributor Author

alamb commented Nov 25, 2024

@Omega359 and @andygrove suggested #13525 (comment) that for this release we

we file tickets as part of every release to test DF with latest versions of common external dependencies (datafusion-python, iceberg, lance, delta, etc) as part of the gating criteria for a release. Perhaps this would be as simple as changing the DF version in those projects and running their test suite. Any blockers that are found in DF have tickets filed and are fixed, other blockers in the respective projects have tickets filed in those projects

Perhaps for 44.0.0 we can try with own subprojects (Ballista, Comet, DF Python, DF Ray)

@alamb
Copy link
Contributor Author

alamb commented Dec 6, 2024

I started gathering a list of items we think we should fix prior to the release in the description

@alamb
Copy link
Contributor Author

alamb commented Dec 6, 2024

Does anyone have any opinion about holding the DataFusion 44 release for the next major arrow release?

That would fix a bunch of the StringView issues, but would likely delay DataFusion 44 for a few more weeks

@Omega359
Copy link
Contributor

Omega359 commented Dec 7, 2024

It's soon to be the holiday season so I'm all for cooking the release a bit more.

@alamb
Copy link
Contributor Author

alamb commented Dec 7, 2024

I would personally love to see DataFusion 44.0.0 be lauded as "super stable" and have few upgrade issues (we would largely achieve this by testing upgrades with other projects prior to release)

The downside is that the changeset in DataFusion 44.0.0 might be larger

@andygrove
Copy link
Member

I'd like to see an upgrade guide for the 44 release (and am willing to take the lead on this). I am trying to upgrade Comet now and am running into some issues.

I am +1 for waiting for the next Arrow release.

@andygrove
Copy link
Member

I filed #13702

@andygrove
Copy link
Member

How would everyone feel about increasing the length of release votes from 3 days to 7 days to give downstream projects more time to test the release?

Sometimes the vote starts on a Friday and passes on a Monday and we can't expect everyone to be working weekends.

@alamb
Copy link
Contributor Author

alamb commented Dec 9, 2024

How would everyone feel about increasing the length of release votes from 3 days to 7 days to give downstream projects more time to test the release?

Sometimes the vote starts on a Friday and passes on a Monday and we can't expect everyone to be working weekends.

In my opinion, we would ideally do the testing before we make the release candidate (as the overhead of making RCs is non trivial). However, I am not opposed to extending the voting timeline if that would get us more testing time

@andygrove
Copy link
Member

Testing before we create the RC makes sense

@andygrove
Copy link
Member

I'm still working on updating Comet to work with latest DF from main (upgrading from 43.0.0). This is quite a disruptive upgrade for us so far.

Screenshot from 2024-12-09 17-37-14

@alamb
Copy link
Contributor Author

alamb commented Dec 13, 2024

I have been thinking about this upgrade and the next arrow release

Specifically I think we need this upgrade in order to resolve some of the issues in 44 related to string view

However, what I would like to propose is:

  1. Plan to release DataFusion 44.0.0 without also upgrading arrow-rs

The downside of this plan is

  1. it will take longer to get the arrow upgrade (and this the string view fixes) throughout the ecosystem

The upsides are

  1. It will reduce the complexity of the DataFusion 44 upgrade
  2. it will give us a longer period to test the upgrade integration DataFusion

If someone is critically waiting on the string view fixes, we can also contemplate making another arrow incremental release

@alamb
Copy link
Contributor Author

alamb commented Dec 13, 2024

I am also going to coordinate with the delta-rs folks to try and test upgrading prior to release:

@timsaucer
Copy link
Contributor

Starting an issue and branch in datafusion-python : apache/datafusion-python#972

@rluvaton
Copy link
Contributor

rluvaton commented Dec 14, 2024

I'll give my 2 cents from my experience regarding how other popular project handle this issue

In Node.js (I'm a core collaborator) before each release we run the tests of the most popular packages with the new node version and making sure we are not breaking them, if we are we let them know. this is really helpful to avoid breaking the entire world

this process is a CITGM, you can watch this talk for more info

@andygrove
Copy link
Member

I am +1 for releasing DF 44 without the next major version of Arrow. However, there are some bug fixes and performance improvements in Arrow that I would like to use so would like to see a new minor release (and am happy to help with this if needed).

@mbwhite
Copy link
Contributor

mbwhite commented Dec 16, 2024

Hi - would like to get #13647 out as soon as is practical - a small fix but does unblock my use of
DataFusion for Substrait.

Thanks!

@alamb
Copy link
Contributor Author

alamb commented Dec 16, 2024

I am +1 for releasing DF 44 without the next major version of Arrow. However, there are some bug fixes and performance improvements in Arrow that I would like to use so would like to see a new minor release (and am happy to help with this if needed).

@alamb
Copy link
Contributor Author

alamb commented Dec 22, 2024

I am looking at #13510 now, and then plan to make a PR with release notes, verison update, etc

Update:

@andygrove
Copy link
Member

@alamb The miri issue does not seem to be resolved for Comet - #13835 (comment)

Is there anything I should have done other than set default-features=false on all DF crates?

@alamb
Copy link
Contributor Author

alamb commented Dec 22, 2024

@alamb The miri issue does not seem to be resolved for Comet - #13835 (comment)

Is there anything I should have done other than set default-features=false on all DF crates?

That sounds like it is the right thing to me 🤔

@alamb
Copy link
Contributor Author

alamb commented Dec 23, 2024

Since @buraksenn did the work to make this optional in [minor] make recursive package dependency optional #13778, perhaps they have some idea?

Update I think I know what is wrong -- working on a fix now - #13887

@alamb
Copy link
Contributor Author

alamb commented Dec 24, 2024

I think all we are waiting on now is someone to review / approve (or fix in some other way) the recrusive-protection feature fix:

@alamb
Copy link
Contributor Author

alamb commented Dec 24, 2024

Update, when preparing to make a final RC I found another issue:

I have a proposed fix here:

@alamb
Copy link
Contributor Author

alamb commented Dec 24, 2024

Ok, we are clear for 🚀 from what I can tell -- the comet upgrade is working. I will merge the following PR

@alamb
Copy link
Contributor Author

alamb commented Dec 25, 2024

I made a release candidate: https://lists.apache.org/thread/78zxfpn7fh1csz6fdolpgh3lb4ltxngp

@shehabgamin
Copy link
Contributor

I made a release candidate: https://lists.apache.org/thread/78zxfpn7fh1csz6fdolpgh3lb4ltxngp

I'll re-test Sail with the release candidate tonight!

@shehabgamin
Copy link
Contributor

Test 2 complete! Smooth Sailing testing commit 073a3b1 from DataFusion Release Candidate branch.

Test details here: #13855 (comment)

@alamb
Copy link
Contributor Author

alamb commented Dec 27, 2024

Thanks @shehabgamin

@timsaucer
Copy link
Contributor

If we can make a new RC I would recommend cherry picking this fix for initcap: #13909

@alamb
Copy link
Contributor Author

alamb commented Dec 28, 2024

As @timsaucer mentions above (and on the mailing list): https://lists.apache.org/thread/34xy1nm47tkrr8lszxzwcrlf5b47h333

There was a regression introduced into initcap that has been sinced fixed: #13909

I will make a new RC

@alamb
Copy link
Contributor Author

alamb commented Dec 28, 2024

have created a new RC and started a new vote thread: https://lists.apache.org/thread/p0hyr78zs03f3snj0mh3vqm6c3kjwshn

@alamb
Copy link
Contributor Author

alamb commented Dec 31, 2024

The release has been approved and published to crates.io: https://lists.apache.org/thread/7w64xqzn71plvmxy61v6px8bb994kr79 🎉

https://crates.io/crates/datafusion/44.0.0

I have also made a small PR to improve the docs for the next release:

@niebayes
Copy link
Contributor

niebayes commented Jan 2, 2025

@alamb Hi, is the release for datafusion-cli delayed?

@alamb
Copy link
Contributor Author

alamb commented Jan 2, 2025

@alamb Hi, is the release for datafusion-cli delayed?

Good catch @niebayes -- I forgot to push this to crates.io. I have done so now and datafusion-cli 44.0.0 is available:

https://crates.io/crates/datafusion-cli/44.0.0

@alamb
Copy link
Contributor Author

alamb commented Jan 2, 2025

Here is a ticket to run ClickBench on DataFusion 44:

@alamb alamb mentioned this issue Jan 4, 2025
14 tasks
@alamb
Copy link
Contributor Author

alamb commented Jan 4, 2025

Filed ticket for the next relese:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests