From b608c8a68f0202785211c27813d77ffb28b992b9 Mon Sep 17 00:00:00 2001 From: Ed Morley <501702+edmorley@users.noreply.github.com> Date: Tue, 16 Apr 2024 17:58:00 +0100 Subject: [PATCH] Change compression format and S3 URL for Python runtime archives As part of the CNB multi-architecture support work, we need to change the Python runtime archive S3 URLs to include the architecture name. In addition, for the CNB transition from "stacks" to "targets", it would be helpful to switch from stack ID references (such as `heroku-22`) in the URL scheme, to the distro name+version (eg `ubuntu` and `22.04`) available to CNBs via the CNB targets feature. See: https://github.com/buildpacks/spec/blob/buildpack/0.10/buildpack.md#targets-1 Rather than duplicate the Python archives on S3 under different filenames, it makes sense to migrate this buildpack to the new archive names too, so the same S3 archives can be used by both this buildpack and the CNB. Moving to new archive names/URLs also means we can safely regenerate all existing Python versions to pick up the changes in #1566 (and changes made in the past, such as #1320), since we won't need to overwrite the old archives and so rolling back to an older buildpack version will work as expected. Since we're changing the S3 URLs anyway, now is also a good time to switch archive compression format from gzip to Zstandard (something that's long overdue). Zstandard (aka zstd) is a much superior compression format over gzip (smaller archives and much faster decompression), and is seeing widespread adoption across multiple ecosystems (eg APT packages, Docker images, web browsers etc). See: https://github.com/facebook/zstd https://github.com/facebook/zstd/blob/dev/programs/README.md#usage-of-command-line-interface Our base images already have `zstd` installed (and for Rust for the CNB, there is the `zstd` crate available), so it's an easy switch. Various compression levels were tested using zstd's benchmarking feature and in the end the highest level of compression picked, since: 1. Unlike some other compression algorithms, zstd's decompression speed is generally not affected by the compression level. 2. We only have to perform the compression once (when compiling Python). 3. Even at the highest compression ratio, it only takes 20 seconds to compress the Python archives compared to the 10 minutes it takes to compile Python itself (when using PGO+LTO). For the Ubuntu 22.04 Python 3.12.3 archive, switching from gzip to zstd (level 22, with long window mode enabled) results in a 26% reduction in compressed archive size. GUS-W-15158299. GUS-W-15505556. --- CHANGELOG.md | 1 + bin/steps/python | 9 ++++++--- builds/build_python_runtime.sh | 12 ++++++------ 3 files changed, 13 insertions(+), 9 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6110a9c76..011963656 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,7 @@ ## [Unreleased] +- Changed compression format and S3 URL for Python runtime archives. ([#1567](https://github.com/heroku/heroku-buildpack-python/pull/1567)) - Adjusted compiler options used to build Python for improved parity with the Docker Hub Python images. ([#1566](https://github.com/heroku/heroku-buildpack-python/pull/1566)) - Excluded `LD_LIBRARY_PATH` and `PYTHONHOME` app config vars when invoking subprocesses during the build. ([#1565](https://github.com/heroku/heroku-buildpack-python/pull/1565)) diff --git a/bin/steps/python b/bin/steps/python index b231e92bb..95046ac1b 100755 --- a/bin/steps/python +++ b/bin/steps/python @@ -8,8 +8,11 @@ runtime-fixer runtime.txt || true PYTHON_VERSION=$(cat runtime.txt) -# The location of the pre-compiled python binary. -PYTHON_URL="${S3_BASE_URL}/${STACK}/runtimes/${PYTHON_VERSION}.tar.gz" +# The Python runtime archive filename is of form: 'python-X.Y.Z-ubuntu-22.04-amd64.tar.zst' +# The Ubuntu version is calculated from `STACK` since it's faster than calling `lsb_release`. +# TODO: Switch to dynamically calculating the architecture when adding support for Heroku-24. +UBUNTU_VERSION="${STACK/heroku-}.04" +PYTHON_URL="${S3_BASE_URL}/${PYTHON_VERSION}-ubuntu-${UBUNTU_VERSION}-amd64.tar.zst" if ! curl --output /dev/null --silent --head --fail --retry 3 --retry-connrefused --connect-timeout 10 "${PYTHON_URL}"; then puts-warn "Requested runtime '${PYTHON_VERSION}' is not available for this stack (${STACK})." @@ -135,7 +138,7 @@ else # Prepare destination directory. mkdir -p .heroku/python - if ! curl --silent --show-error --fail --retry 3 --retry-connrefused --connect-timeout 10 "${PYTHON_URL}" | tar -zxC .heroku/python; then + if ! curl --silent --show-error --fail --retry 3 --retry-connrefused --connect-timeout 10 "${PYTHON_URL}" | tar --zstd --extract --directory .heroku/python; then # The Python version was confirmed to exist previously, so any failure here is due to # a networking issue or archive/buildpack bug rather than the runtime not existing. puts-warn "Failed to download/install ${PYTHON_VERSION}" diff --git a/builds/build_python_runtime.sh b/builds/build_python_runtime.sh index 814267ad0..dfd7bc368 100755 --- a/builds/build_python_runtime.sh +++ b/builds/build_python_runtime.sh @@ -10,7 +10,7 @@ PYTHON_MAJOR_VERSION="${PYTHON_VERSION%.*}" # we install Python into an arbitrary location that intentionally matches neither location. INSTALL_DIR="/tmp/python" SRC_DIR="/tmp/src" -UPLOAD_DIR="/tmp/upload/${STACK}/runtimes" +UPLOAD_DIR="/tmp/upload" function error() { echo "Error: ${1}" >&2 @@ -213,12 +213,12 @@ LD_LIBRARY_PATH="${SRC_DIR}" "${SRC_DIR}/python" -m compileall -f --invalidation # This symlink must be relative, to ensure that the Python install remains relocatable. ln -srvT "${INSTALL_DIR}/bin/python3" "${INSTALL_DIR}/bin/python" -# The tar file is gzipped separately, so we can set a higher gzip compression level than -# the default. In the future we'll also want to create a second archive that used zstd. -# Results in a compressed archive filename of form: 'python-X.Y.Z.tar.gz' -TAR_FILEPATH="${UPLOAD_DIR}/python-${PYTHON_VERSION}.tar" +# Results in a compressed archive filename of form: 'python-X.Y.Z-ubuntu-22.04-amd64.tar.zst' +# TODO: Switch to dynamically calculating the architecture when adding support for Heroku-24. +UBUNTU_VERSION=$(lsb_release --short --release 2>/dev/null) +TAR_FILEPATH="${UPLOAD_DIR}/python-${PYTHON_VERSION}-ubuntu-${UBUNTU_VERSION}-amd64.tar" tar --create --format=pax --sort=name --file "${TAR_FILEPATH}" --directory="${INSTALL_DIR}" . -gzip --best "${TAR_FILEPATH}" +zstd -T0 -22 --ultra --long --no-progress --rm "${TAR_FILEPATH}" du --max-depth 1 --human-readable "${INSTALL_DIR}" du --all --human-readable "${UPLOAD_DIR}"