-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change compression format and S3 URL for Python runtime archives #1567
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Builds for all supported Python versions have been triggered using the GitHub CLI: for v in 3.8.{0..19} 3.9.{0..19} 3.10.{0..14} 3.11.{0..9} 3.12.{0..3}; do
gh workflow run build_python_runtime.yml --ref new-url-structure-and-zstd -F "python_version=${v}"
done And can be viewed here: |
edmorley
force-pushed
the
new-url-structure-and-zstd
branch
from
April 18, 2024 09:56
b608c8a
to
546bb96
Compare
runesoerensen
approved these changes
Apr 18, 2024
edmorley
force-pushed
the
improved-eol-error-messages
branch
from
April 18, 2024 15:46
acb5259
to
b04f3eb
Compare
edmorley
added a commit
that referenced
this pull request
Apr 18, 2024
For cases where a requested Python version is both (a) EOL, and (b) was never built for that stack (such as is the case when we add new stacks), previously the generic "version isn't available for this stack" error message was shown instead of the more specific EOL Python version error message. Now, the EOL version check is performed first before the S3 presence check, so the more specific EOL message is shown for this case. In addition to improving the UX, making this change now reduces the test fixture churn both when we add a new stack and for #1567. I've also dropped the "PyPy is no longer supported" error message and associated test, since very few apps ever used it and it's now been 19 months since support was removed in #1364, so it's fine to show the generic "Python version isn't available" error message for it instead. GUS-W-15541279.
As part of the CNB multi-architecture support work, we need to change the Python runtime archive S3 URLs to include the architecture name. In addition, for the CNB transition from "stacks" to "targets", it would be helpful to switch from stack ID references (such as `heroku-22`) in the URL scheme, to the distro name+version (eg `ubuntu` and `22.04`) available to CNBs via the CNB targets feature. See: https://github.com/buildpacks/spec/blob/buildpack/0.10/buildpack.md#targets-1 Rather than duplicate the Python archives on S3 under different filenames, it makes sense to migrate this buildpack to the new archive names too, so the same S3 archives can be used by both this buildpack and the CNB. Moving to new archive names/URLs also means we can safely regenerate all existing Python versions to pick up the changes in #1566 (and changes made in the past, such as #1320), since we won't need to overwrite the old archives and so rolling back to an older buildpack version will work as expected. Since we're changing the S3 URLs anyway, now is also a good time to switch archive compression format from gzip to Zstandard (something that's long overdue). Zstandard (aka zstd) is a much superior compression format over gzip (smaller archives and much faster decompression), and is seeing widespread adoption across multiple ecosystems (eg APT packages, Docker images, web browsers etc). See: https://github.com/facebook/zstd https://github.com/facebook/zstd/blob/dev/programs/README.md#usage-of-command-line-interface Our base images already have `zstd` installed (and for Rust for the CNB, there is the `zstd` crate available), so it's an easy switch. Various compression levels were tested using zstd's benchmarking feature and in the end the highest level of compression picked, since: 1. Unlike some other compression algorithms, zstd's decompression speed is generally not affected by the compression level. 2. We only have to perform the compression once (when compiling Python). 3. Even at the highest compression ratio, it only takes 20 seconds to compress the Python archives compared to the 10 minutes it takes to compile Python itself (when using PGO+LTO). For the Ubuntu 22.04 Python 3.12.3 archive, switching from gzip to zstd (level 22, with long window mode enabled) results in a 26% reduction in compressed archive size. GUS-W-15158299. GUS-W-15505556.
edmorley
force-pushed
the
new-url-structure-and-zstd
branch
from
April 18, 2024 15:49
546bb96
to
cae44b2
Compare
Merged
edmorley
added a commit
to heroku/buildpacks-python
that referenced
this pull request
May 2, 2024
…argets (#197) A `libcnb.rs` release supports a single Buildpack API version, so whenever we update to a libcnb release that now implements a newer Buildpack API version, we must switch to that version in the buildpack at the same time. This change updates the buildpack to the latest libcnb release, which requires both a switch to Buildpack API 0.10, a switch from stacks to targets, and also some adjustments for layer API changes. As part of the switch from stacks to targets, the buildpack now consumes the Python runtime from the new S3 location/filenames (that use distro name/version in the URL instead of stack ID), which were added in: heroku/heroku-buildpack-python#1567 The new archives also now use Zstandard (aka zstd) for compression instead of gzip, which results in a faster download due to the smaller archive size (for example, the Ubuntu 22.04 Python 3.12.3 AMD64 archive was 26% smaller) as well as faster decompression. This required switching from the `flate2` crate to the `zstd` crate. A side-effect of switching to the new S3 files is that the archives for Python 3.7 are no longer available, since I intentionally did not build them given that Python 3.7 is EOL. As such, this change also drops support for Python 3.7 (something that the classic buildpack has already done, and would have been done here already if it were not for being blocked on #8). The switch to targets unblocks Heroku-24/multi-architecture support, which will be handled in a later PR. See: https://github.com/heroku/libcnb.rs/blob/main/CHANGELOG.md#0210---2024-04-30 https://github.com/buildpacks/spec/releases/tag/buildpack%2Fv0.10 https://github.com/buildpacks/spec/blob/buildpack/0.10/buildpack.md#targets-1 https://docs.rs/zstd/latest/zstd/ Closes #192. Closes #194. GUS-W-15261168.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(This change has been split out of the Heroku-24 PR for easier review.)
As part of the CNB multi-architecture support work, we need to change the Python runtime archive S3 URLs to include the architecture name. In addition, for the CNB transition from "stacks" to "targets", it would be helpful to switch from stack ID references (such as
heroku-22
) in the URL scheme, to the distro name+version (egubuntu
and22.04
) available to CNBs via the CNB targets feature. See:https://github.com/buildpacks/spec/blob/buildpack/0.10/buildpack.md#targets-1
Rather than duplicate the Python archives on S3 under different filenames/locations, it makes sense to migrate this buildpack to the new archive names too, so the same S3 archives can be used by both this buildpack and the CNB.
Moving to new archive names/URLs also means we can safely regenerate all existing Python versions to pick up the changes in #1566 (and changes made in the past, such as #1319, #1320, #1321 and #1322), since we won't have to worry about overwriting the old archives (which is something we've typically avoided, since it isn't compatible with the model of being able to roll back to an older buildpack version to return to prior behaviour).
Since we're changing the S3 URLs anyway, now is also a good time to make another change that would otherwise cause churn in the S3 URLs again (which affects people that pin buildpack version): Switching archive compression format from gzip to Zstandard (something that we've been wanting to do for a while).
Zstandard (aka zstd) is a much superior compression format over gzip (smaller archives and much faster decompression), and is seeing widespread adoption across multiple ecosystems (eg APT packages, Docker images, web browsers etc).
See:
https://github.com/facebook/zstd
https://github.com/facebook/zstd/blob/dev/programs/README.md#usage-of-command-line-interface
Our base images already have
zstd
installed (and for Rust for the CNB, there is the zstd crate available), so it's an easy switch.Various compression levels were tested using zstd's benchmarking feature and in the end the highest level of compression picked, since:
For the Ubuntu 22.04 Python 3.12.3 archive, switching from gzip to zstd (level 22, with long window mode enabled) results in a 26% reduction in compressed archive size.
GUS-W-15158299.
GUS-W-15505556.