Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regenerate arrow-ipc/src/gen with patched flatbuffers #6426

Merged
merged 12 commits into from
Jan 8, 2025

Conversation

bkietz
Copy link
Member

@bkietz bkietz commented Sep 20, 2024

Which issue does this PR close?

Closes #5052

What changes are included in this PR?

This PR regenerates using patched codegen in google/flatbuffers#8398

  • arrow-ipc/regen.sh is extended to
    • If the $FLATC environment variable is defined, that will be used instead of building flatc from source (useful when testing against patched codegen)
    • Prefix the import flatbuffers::PushAlignment to all generated files for reference by impls of Push::alignment
  • Endianness::equals_to_target_endianness() is moved to arrow-ipc/src/lib.rs (this was originally added to arrow-ipc/src/gen/Schema.rs manually in Result into error in case of endianness mismatches #5301)
  • regen.sh was rerun, cargo doc and cargo test didn't error locally

Are there any user-facing changes?

no

@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 20, 2024
arrow-ipc/Cargo.toml Outdated Show resolved Hide resolved
sed -i '' '/use core::mem;/d' $f
sed -i '' '/use core::cmp::Ordering;/d' $f
sed -i '' '/use self::flatbuffers::{EndianScalar, Follow};/d' $f
sed --in-place='' '/extern crate flatbuffers;/d' $f
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why but bash/sed produced an error for me until I made this substitution

sed: can't read /extern crate flatbuffers;/d: No such file or directory

Reading the man page, it seems like the expected invocation for the short flag is -ibackup, equivalent to --inplace=backup (or -i, equivalent to --inplace='')

       -i[SUFFIX], --in-place[=SUFFIX]

              edit files in place (makes backup if SUFFIX supplied)

Should I replace this with

Suggested change
sed --in-place='' '/extern crate flatbuffers;/d' $f
sed -i '/extern crate flatbuffers;/d' $f

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC Mac's sed (BSD utils) and the Linux coreutils sed behave differently. It's a bit of a pain.


# Some files need prefixes
if [[ $f == "File.rs" ]]; then
# Now prefix the file with the static contents
echo -e "${PREFIX}" "${SCHEMA_IMPORT}" | cat - $f > temp && mv temp $f
elif [[ $f == "Message.rs" ]]; then
sed --in-place='' 's/List<Int16>/\`List<Int16>\`/g' $f
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was another manual edit of the generated files. Moving it here should simplify the next time regen.sh is used. Are there any other manual edits which I haven't noticed?

arrow-ipc/Cargo.toml Outdated Show resolved Hide resolved
@bkietz
Copy link
Member Author

bkietz commented Sep 20, 2024

I suspect the flight failures are due to the slight increase in size due to padding FieldNode and Buffer vectors for alignment; the overages are all exactly 8 bytes greater than allowed. May I just increase the allowance?

@bkietz
Copy link
Member Author

bkietz commented Sep 20, 2024

@tustvold @evgenyx00 could you verify this fixes the issue before we start asking for attention on google/flatbuffers?

@alamb
Copy link
Contributor

alamb commented Sep 25, 2024

Perhaps we could re-enable the test for nanoarrow that was disabled in #6449 as part of this PR?

@bkietz bkietz force-pushed the regen-flatbuffers-with-push-alignment branch from 41d3a5b to 7c8324f Compare September 26, 2024 16:14
@@ -65,8 +65,7 @@ jobs:
ARROW_INTEGRATION_JAVA: ON
ARROW_INTEGRATION_JS: ON
ARCHERY_INTEGRATION_TARGET_IMPLEMENTATIONS: "rust"
# Disable nanoarrow integration, due to https://github.com/apache/arrow-rs/issues/5052
Copy link
Contributor

@alamb alamb Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this line and having the test run successfully is quite compelling. Nice work @bkietz

Screenshot 2024-09-26 at 4 06 35 PM

@bkietz bkietz force-pushed the regen-flatbuffers-with-push-alignment branch from 7c8324f to 8b19902 Compare October 25, 2024 16:57
@bkietz bkietz marked this pull request as ready for review October 25, 2024 16:58
arrow-ipc/regen.sh Outdated Show resolved Hide resolved
@bkietz bkietz requested a review from alamb October 28, 2024 21:07
@bkietz
Copy link
Member Author

bkietz commented Nov 4, 2024

@alamb Is there anything which needs to happen to this PR?

@@ -39,7 +39,7 @@ arrow-buffer = { workspace = true }
arrow-cast = { workspace = true }
arrow-data = { workspace = true }
arrow-schema = { workspace = true }
flatbuffers = { version = "24.3.25", default-features = false }
flatbuffers = { default-features = false, git = "https://github.com/google/flatbuffers.git", rev = "49061f8c7c99363eeea25c5e4337ebb499928467" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would need to be updated to a non-git version prior to merge, otherwise we'd be unable to make a release off main.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact I think we can't actually release a fix for arrow with this fix as the upstream change adds a new method alignment -- it doesn't just update the generated code I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I reverted this pin it fails to compile

error[E0433]: failed to resolve: could not find `PushAlignment` in `flatbuffers`
  --> arrow-ipc/src/gen/File.rs:71:22
   |
71 |         flatbuffers::PushAlignment::new(8)
   |                      ^^^^^^^^^^^^^ could not find `PushAlignment` in `flatbuffers`

Thus I think we need to wait for the next flatbuffers release 🤔

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bkietz -- I think I mostly need about 4 extra hours each day !

@@ -39,7 +39,7 @@ arrow-buffer = { workspace = true }
arrow-cast = { workspace = true }
arrow-data = { workspace = true }
arrow-schema = { workspace = true }
flatbuffers = { version = "24.3.25", default-features = false }
flatbuffers = { default-features = false, git = "https://github.com/google/flatbuffers.git", rev = "49061f8c7c99363eeea25c5e4337ebb499928467" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact I think we can't actually release a fix for arrow with this fix as the upstream change adds a new method alignment -- it doesn't just update the generated code I think

@@ -39,7 +39,7 @@ arrow-buffer = { workspace = true }
arrow-cast = { workspace = true }
arrow-data = { workspace = true }
arrow-schema = { workspace = true }
flatbuffers = { version = "24.3.25", default-features = false }
flatbuffers = { default-features = false, git = "https://github.com/google/flatbuffers.git", rev = "49061f8c7c99363eeea25c5e4337ebb499928467" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I reverted this pin it fails to compile

error[E0433]: failed to resolve: could not find `PushAlignment` in `flatbuffers`
  --> arrow-ipc/src/gen/File.rs:71:22
   |
71 |         flatbuffers::PushAlignment::new(8)
   |                      ^^^^^^^^^^^^^ could not find `PushAlignment` in `flatbuffers`

Thus I think we need to wait for the next flatbuffers release 🤔

@osipovartem
Copy link

Hey! Any updates here? 🙏

@alamb
Copy link
Contributor

alamb commented Dec 16, 2024

Hey! Any updates here? 🙏

So maybe someone could file / find a ticket upstream in flatbuffers asking about a new rust release?

@crepererum
Copy link
Contributor

I've pinged a potential person here:
google/flatbuffers#8398 (comment)

Let's see. flatbuffers isn't a particularly active project.

@osipovartem
Copy link

New flatbuffers version has been released (24.12.23)

@alamb
Copy link
Contributor

alamb commented Dec 30, 2024

New flatbuffers version has been released (24.12.23)

Thanks @osipovartem

@osipovartem or @bkietz -- can one of you prepare a PR using the newly released flatbuffers code? I would be happy to review it afterwards.

@bkietz bkietz force-pushed the regen-flatbuffers-with-push-alignment branch from 16783bc to ac8b01c Compare January 6, 2025 20:03
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great (and epic) to me. Thank you so much @bkietz for pushing it along and sticking with it 🎉 🚀 🏅

@@ -38,7 +38,7 @@ arrow-array = { workspace = true }
arrow-buffer = { workspace = true }
arrow-data = { workspace = true }
arrow-schema = { workspace = true }
flatbuffers = { version = "24.3.25", default-features = false }
flatbuffers = { version = "24.12.23", default-features = false }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@@ -1057,15 +1057,6 @@ impl Endianness {
_ => None,
}
}

/// Returns true if the endianness of the source system matches the endianness of the target system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 for extracting this out to a non gen file

sed -i '' '/use core::mem;/d' $f
sed -i '' '/use core::cmp::Ordering;/d' $f
sed -i '' '/use self::flatbuffers::{EndianScalar, Follow};/d' $f
sed --in-place='' '/extern crate flatbuffers;/d' $f
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC Mac's sed (BSD utils) and the Linux coreutils sed behave differently. It's a bit of a pain.

let src = ::core::slice::from_raw_parts(self as *const Block as *const u8, Self::size());
dst.copy_from_slice(src);
}
+ #[inline]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't flatbuffers 24.12.23 do that now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a holdover; we don't need the patch anymore. I've removed it

@alamb
Copy link
Contributor

alamb commented Jan 8, 2025

woohoo! Thanks again @bkietz

@alamb alamb merged commit 485dbb1 into apache:main Jan 8, 2025
34 checks passed
@paleolimbot
Copy link
Member

Thank you all for following up on this!

paleolimbot added a commit to apache/arrow-nanoarrow that referenced this pull request Jan 8, 2025
After apache/arrow-rs#6426 , the issue causing
the arrow-rs integration has been fixed!
@bkietz bkietz deleted the regen-flatbuffers-with-push-alignment branch January 9, 2025 19:01
@alamb
Copy link
Contributor

alamb commented Jan 12, 2025

Thank you all for following up on this!

💯 -- mad props to @bkietz for sure

CurtHagenlocher pushed a commit to CurtHagenlocher/arrow-rs that referenced this pull request Jan 13, 2025
* regenerate arrow-ipc/src/gen with patched flatbuffers

* use git repo instead of local path

* add backticks

* expand allowed overage to accommodate more alignment padding

* re-enable nanoarrow integration test

* add assertions that struct alignment is correct

* remove struct alignment assertions

* apply a patch to generated code rather than requiring patched flatc

* point to google/flatbuffers with pub PushAlignment

* add license header to gen.patch

* use flatbuffers 24.12.23

* remove unnecessary gen.patch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interoperability between arrow-rs and nanoarrow
6 participants