Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

style+perf: clean-up and optimize remove_empty_byte_from_padded_bytes_unchecked fn #41

Merged

Conversation

samlaf
Copy link
Contributor

@samlaf samlaf commented Jan 13, 2025

This was a fun weekend. Got to learn a crap ton about rust iterators, assembly output, godbolt, llvm, etc.

I was just trying to make this function cleaner by adopting a functional iterator, but in doing so realized the code was then much slower (up to 7x depending on input size). With 2 small modifications, managed to get the output to use pre-allocated output vector and use simd instructions for copying, which made the code 2-7x FASTER depending on input size.

Benchmarks are available in master...perf--remove-empty-byte-from-padded-bytes-fn-benchmark. Here are the results (function_fast is the function implemented in this PR):

  1. for 32B inputs
    image

  2. for 32KiB inputs
    image

  3. for 32MiB inputs
    image

Note: I decided to implement the functional_fast function instead of the fast function (which contains the same logic but written without iterators), because I personally find it cleaner to read. I do have to note however that the version with iterators (the one in this PR) is faster on 32KiB inputs but (slightly) slower on 32MiB. If we ever have teams sending huge bytes in the future, we might want to implement both approaches and let them pick and choose? Or perhaps have a wrapper that dispatches to the correct implementation based on input size?

@samlaf samlaf requested review from anupsv and bxue-l2 January 13, 2025 04:03
There were a bunch of warnings that some of our set fmt properties were not being run:
Warning: can't set `wrap_comments = true`, unstable features are only available in nightly channel.
Warning: can't set `normalize_comments = true`, unstable features are only available in nightly channel.
Getting "error: toolchain 'nightly-x86_64-unknown-linux-gnu' is not installed" on github,
and don't feel like debugging. Not even sure how cargo/rust are installed.
Do they come preloaded by default?

This reverts commit 6e87e0a.
@samlaf
Copy link
Contributor Author

samlaf commented Jan 13, 2025

Note: Apologies about the large number of edits that are just formatting.... applied cargo +nightly fmt. Realized our ci is not using nightly version, which is actually needed for some of the formatting options we use (if you look at CI output you'll see a bunch of warnings). I tried changing the github workflow in 6e87e0a but the nightly toolchain wasn't available so just reverted that change.. but we should fix that.

bxue-l2
bxue-l2 previously approved these changes Jan 13, 2025
src/kzg.rs Outdated
/// Precompute the primitive roots of unity for binary powers that divide r - 1
/// TODO(anupsv): Move this to the constants file. Ref: https://github.com/Layr-Labs/rust-kzg-bn254/issues/31
/// Precompute the primitive roots of unity for binary powers that divide r
/// - 1 TODO(anupsv): Move this to the constants file. Ref: https://github.com/Layr-Labs/rust-kzg-bn254/issues/31
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why break a line at -1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh that's what our linter when ran with nightly version does.... think I should just revert that commit?
@anupsv we'll need to look at that linter config at some point. It seems not that great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the cargo +nightly fmt commit and formatted with stable rust instead. PTAL

@bxue-l2
Copy link
Collaborator

bxue-l2 commented Jan 13, 2025

wait, exactly which approach you implemented? confusing to read " I do have to not however that the version with iterators (the one in this PR) is faster on 32KiB inputs but (slightly) slower on 32MiB."

Code wise it looks right to mee

Copy link
Collaborator

@anupsv anupsv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@samlaf
Copy link
Contributor Author

samlaf commented Jan 14, 2025

wait, exactly which approach you implemented? confusing to read " I do have to not however that the version with iterators (the one in this PR) is faster on 32KiB inputs but (slightly) slower on 32MiB."

Code wise it looks right to mee

Updated PR description, should have read "I do have to NOTE however"

@samlaf samlaf merged commit b83fc92 into master Jan 14, 2025
1 check passed
@samlaf samlaf deleted the perf--optimize-fn-remote-empty-byte-from-padded-bytes-unchecked branch January 14, 2025 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants