Skip to content

Commit

Permalink
Change compression format and S3 URL for Python runtime archives
Browse files Browse the repository at this point in the history
As part of the CNB multi-architecture support work, we need to change
the Python runtime archive S3 URLs to include the architecture name. In
addition, for the CNB transition from "stacks" to "targets", it would be
helpful to switch from stack ID references (such as `heroku-22`) in the
URL scheme, to the distro name+version (eg `ubuntu` and `22.04`)
available to CNBs via the CNB targets feature.

See:
https://github.com/buildpacks/spec/blob/buildpack/0.10/buildpack.md#targets-1

Rather than duplicate the Python archives on S3 under different
filenames, it makes sense to migrate this buildpack to the new
archive names too, so the same S3 archives can be used by both
this buildpack and the CNB.

Moving to new archive names/URLs also means we can safely regenerate all
existing Python versions to pick up the changes in #1566 (and changes
made in the past, such as #1320), since we won't need to overwrite the
old archives and so rolling back to an older buildpack version will work
as expected.

Since we're changing the S3 URLs anyway, now is also a good time to
switch archive compression format from gzip to Zstandard (something
that's long overdue).

Zstandard (aka zstd) is a much superior compression format over gzip
(smaller archives and much faster decompression), and is seeing
widespread adoption across multiple ecosystems (eg APT packages, Docker
images, web browsers etc).

See:
https://github.com/facebook/zstd
https://github.com/facebook/zstd/blob/dev/programs/README.md#usage-of-command-line-interface

Our base images already have `zstd` installed (and for Rust for the CNB,
there is the `zstd` crate available), so it's an easy switch.

Various compression levels were tested using zstd's benchmarking feature
and in the end the highest level of compression picked, since:
1. Unlike some other compression algorithms, zstd's decompression speed
   is generally not affected by the compression level.
2. We only have to perform the compression once (when compiling Python).
3. Even at the highest compression ratio, it only takes 20 seconds to
   compress the Python archives compared to the 10 minutes it takes to
   compile Python itself (when using PGO+LTO).

For the Ubuntu 22.04 Python 3.12.3 archive, switching from gzip to zstd
(level 22, with long window mode enabled) results in a 26% reduction in
compressed archive size.

GUS-W-15158299.
GUS-W-15505556.
  • Loading branch information
edmorley committed Apr 18, 2024
1 parent 42d1ba2 commit cae44b2
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 8 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- Improved the error message shown for EOL Python versions when using a stack for which those versions were never built. ([#1570](https://github.com/heroku/heroku-buildpack-python/pull/1570))
- Fixed the "Python security update is available" warning being shown when the requested version is newer than the latest version known to the buildpack. ([#1569](https://github.com/heroku/heroku-buildpack-python/pull/1569))
- Fixed glibc warnings seen when downgrading the stack version. ([#1568](https://github.com/heroku/heroku-buildpack-python/pull/1568))
- Changed compression format and S3 URL for Python runtime archives. ([#1567](https://github.com/heroku/heroku-buildpack-python/pull/1567))
- Adjusted compiler options used to build Python for improved parity with the Docker Hub Python images. ([#1566](https://github.com/heroku/heroku-buildpack-python/pull/1566))
- Excluded `LD_LIBRARY_PATH` and `PYTHONHOME` app config vars when invoking subprocesses during the build. ([#1565](https://github.com/heroku/heroku-buildpack-python/pull/1565))

Expand Down
8 changes: 6 additions & 2 deletions bin/steps/python
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,11 @@ case "${PYTHON_VERSION}" in
;;
esac

PYTHON_URL="${S3_BASE_URL}/${STACK}/runtimes/${PYTHON_VERSION}.tar.gz"
# The Python runtime archive filename is of form: 'python-X.Y.Z-ubuntu-22.04-amd64.tar.zst'
# The Ubuntu version is calculated from `STACK` since it's faster than calling `lsb_release`.
# TODO: Switch to dynamically calculating the architecture when adding support for Heroku-24.
UBUNTU_VERSION="${STACK/heroku-}.04"
PYTHON_URL="${S3_BASE_URL}/${PYTHON_VERSION}-ubuntu-${UBUNTU_VERSION}-amd64.tar.zst"

if ! curl --output /dev/null --silent --head --fail --retry 3 --retry-connrefused --connect-timeout 10 "${PYTHON_URL}"; then
puts-warn
Expand Down Expand Up @@ -138,7 +142,7 @@ else
# Prepare destination directory.
mkdir -p .heroku/python

if ! curl --silent --show-error --fail --retry 3 --retry-connrefused --connect-timeout 10 "${PYTHON_URL}" | tar -zxC .heroku/python; then
if ! curl --silent --show-error --fail --retry 3 --retry-connrefused --connect-timeout 10 "${PYTHON_URL}" | tar --zstd --extract --directory .heroku/python; then
# The Python version was confirmed to exist previously, so any failure here is due to
# a networking issue or archive/buildpack bug rather than the runtime not existing.
puts-warn "Failed to download/install ${PYTHON_VERSION}"
Expand Down
12 changes: 6 additions & 6 deletions builds/build_python_runtime.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ PYTHON_MAJOR_VERSION="${PYTHON_VERSION%.*}"
# we install Python into an arbitrary location that intentionally matches neither location.
INSTALL_DIR="/tmp/python"
SRC_DIR="/tmp/src"
UPLOAD_DIR="/tmp/upload/${STACK}/runtimes"
UPLOAD_DIR="/tmp/upload"

function error() {
echo "Error: ${1}" >&2
Expand Down Expand Up @@ -213,12 +213,12 @@ LD_LIBRARY_PATH="${SRC_DIR}" "${SRC_DIR}/python" -m compileall -f --invalidation
# This symlink must be relative, to ensure that the Python install remains relocatable.
ln -srvT "${INSTALL_DIR}/bin/python3" "${INSTALL_DIR}/bin/python"

# The tar file is gzipped separately, so we can set a higher gzip compression level than
# the default. In the future we'll also want to create a second archive that used zstd.
# Results in a compressed archive filename of form: 'python-X.Y.Z.tar.gz'
TAR_FILEPATH="${UPLOAD_DIR}/python-${PYTHON_VERSION}.tar"
# Results in a compressed archive filename of form: 'python-X.Y.Z-ubuntu-22.04-amd64.tar.zst'
# TODO: Switch to dynamically calculating the architecture when adding support for Heroku-24.
UBUNTU_VERSION=$(lsb_release --short --release 2>/dev/null)
TAR_FILEPATH="${UPLOAD_DIR}/python-${PYTHON_VERSION}-ubuntu-${UBUNTU_VERSION}-amd64.tar"
tar --create --format=pax --sort=name --file "${TAR_FILEPATH}" --directory="${INSTALL_DIR}" .
gzip --best "${TAR_FILEPATH}"
zstd -T0 -22 --ultra --long --no-progress --rm "${TAR_FILEPATH}"

du --max-depth 1 --human-readable "${INSTALL_DIR}"
du --all --human-readable "${UPLOAD_DIR}"

0 comments on commit cae44b2

Please sign in to comment.