Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress release bundle with zstandard/zstd to reduce size #2400

Closed
DSmithVA opened this issue Jul 31, 2024 · 10 comments
Closed

Compress release bundle with zstandard/zstd to reduce size #2400

DSmithVA opened this issue Jul 31, 2024 · 10 comments

Comments

@DSmithVA
Copy link

I propose a .zstd download option alongside the existing .gz one for Linux releases. For the latest 2.18.1 linux64 bundle, using zstd instead of gzip can cut off 33% of the file size, or 822.8 MiB down to 553.2 MiB.

Example command to convert the existing .gz:
zcat codeql-bundle-linux64.tar.gz | zstd --long=27 -9 -o codeql-bundle-linux64.tar.zstd

File sizes:
862823301 codeql-bundle-linux64.tar.gz
580124258 codeql-bundle-linux64.tar.zstd

For zstd arguments, compression levels above -9 saw diminishing returns, though -19 does get down to 504.5 MiB while taking 12x longer to compress. Using higher --long= values improves compression, but 27 is the highest value that clients can process by default, per https://github.com/facebook/zstd/blob/dev/programs/zstd.1.md?plain=1#L162

Compression with xz is also an improvement, it's just noticeably slower. Either is an improvement over just .gz and any recent linux will support both .zstd or .xz for decompression.

@jketema
Copy link

jketema commented Jul 31, 2024

Thanks for your feedback. We'll take this into consideration.

@marcellodesales
Copy link

marcellodesales commented Jan 21, 2025

This was implemented, but now failing on Github Enterprise because the base docker images running in the ARC doesn't include zstd in any of the tags, and the current v3 tag is pointing to a version that requires zstd.

So, codeQL Can't start be initialized in the default runner... I don't see any release of the runner with the zstd in https://github.com/actions/runner/blob/main/images/Dockerfile ...

Image

@DSmithVA
Copy link
Author

That is unfortunate. Also, the .zst archives being created are much larger than necessary since the --long=27 flag was not used. For the most recent linux bundle I get a 25% smaller file:

curl -LO https://github.com/github/codeql-action/releases/download/codeql-bundle-v2.20.1/codeql-bundle-linux64.tar.zst
stat -c %s codeql-bundle-linux64.tar.zst
608400767
cat codeql-bundle-linux64.tar.zst | zstd -d | zstd --long=27 | wc -c
455054249

@hvitved
Copy link

hvitved commented Jan 21, 2025

@marcellodesales : According to our engineers, the job should only download the .zst bundle when zstd exists on the path, falling back to tar if it doesn't. Can I please ask you to rerun the job in debug mode, and upload the log files if possible, for us to better debug the issue?

@hvitved
Copy link

hvitved commented Jan 21, 2025

@DSmithVA : Thanks a lot for bringing this to our attention; we are currently testing this approach, and it does indeed look promising.

@marcellodesales
Copy link

@hvitved I have posted the info below at #2705 (comment) as well... All languages fail with the latest version...

🔧 Settings

      - name: Initialize CodeQL
        uses: github/codeql-action/[email protected]
        with:
          debug: true
          languages: go
          build-mode: "manual"
          config-file: .github/codeql-config.yml

⌨ Logs

##[debug]Evaluating condition for step: 'Initialize CodeQL'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Initialize CodeQL
##[debug]Register post job cleanup for action: github/codeql-action/init@v3.[2](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:2)8.1
##[debug]Loading inputs
##[debug]Evaluating: secrets.ACCESS_TOKEN
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'ACCESS_TOKEN'
##[debug]=> null
##[debug]Result: null
##[debug]Evaluating: github.token
##[debug]Evaluating Index:
##[debug]..Evaluating github:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'token'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Evaluating: toJson(matrix)
##[debug]Evaluating toJson:
##[debug]..Evaluating matrix:
##[debug]..=> null
##[debug]=> 'null'
##[debug]Result: 'null'
##[debug]Loading env
Run github/codeql-action/init@v[3](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:3).28.1
  
Job run UUID is 0cde5708-9[4](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:4)e4-46a6-80e2-deb7dfb9[5](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:5)ff0.
##[debug]Running git command: git rev-parse HEAD
##[debug]Sending status report: {"action_name":"init","action_oid":"unknown","action_ref":"v3.28.1","action_started_at":"2025-01-21T17:23:55.998Z","action_version":"3.28.1","analysis_key":".github/workflows/codeql-golang.yml:analyze","commit_oid":"d8d1429bce[6](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:6)6e76202d14b5cc22251b91dfaa91f","first_party_analysis":true,"job_name":"analyze","job_run_uuid":"0cde5[7](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:7)08-94e4-46a6-80e2-deb7dfb95ff0","ref":"refs/pull/104/merge","runner_os":"Linux","started_at":"2025-01-21T17:23:55.99[8](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:8)Z","status":"starting","steady_state_default_setup":false,"testing_environment":"","workflow_name":"codeQL-golang","workflow_run_attempt":2,"workflow_run_id":2678337,"actions_event_name":"pull_request","runner_available_disk_space_bytes":741[9](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:9)637760,"runner_total_disk_space_bytes":8589934592,"matrix_vars":"null","runner_arch":"X64"}
::group::Setup CodeQL tools
Setup CodeQL tools
  ##[debug]Found tar.
  ##[debug]Could not find zstd: Error: Unable to locate executable file: zstd. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also check the file mode to verify the file is executable.
  /usr/bin/tar --version
  tar (GNU tar) 1.34
  Copyright (C) 2021 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  
  Written by John Gilmore and Jay Fenlason.
  Found gnu tar version 1.34.
  ##[debug]Attempting to obtain CodeQL tools. CLI version: 2.20.1, bundle tag name: codeql-bundle-v2.20.1, URL: unspecified.
  ##[debug]isExplicit: 2.20.1
  ##[debug]explicit? true
  ##[debug]checking cache: /home/runner/_work/_tool/CodeQL/2.20.1/x64
  ##[debug]not found
  ##[debug]Didn't find a version of the CodeQL tools in the toolcache with a version number exactly matching 2.20.1.
  ##[debug]Found the following versions of the CodeQL tools in the toolcache: [].
  ##[debug]Didn't find any versions of the CodeQL tools starting with 2.20.1 in the toolcache. Trying next fallback method.
  ##[debug]Computed a fallback toolcache version number of 2.20.1 for CodeQL version 2.20.1.
  ##[debug]isExplicit: 2.20.1
  ##[debug]explicit? true
  ##[debug]checking cache: /home/runner/_work/_tool/CodeQL/2.20.1/x64
  ##[debug]not found
  Did not find CodeQL tools version 2.20.1 in the toolcache.
  ##[debug]Did not find any candidate pinned versions of the CodeQL tools in the toolcache.
  Found CodeQL bundle in github/codeql-action on https://git.company.com with URL https://git.company.com/api/v3/repos/github/codeql-action/releases/assets/5565.
  Using CodeQL CLI version 2.20.1 sourced from https://git.company.com/api/v3/repos/github/codeql-action/releases/assets/5565 .
  ##[debug]Providing an authorization token to download CodeQL tools.
  ##[debug]Not running against github.com. Disabling all toggleable features.
  ##[debug]Writing feature flags to /home/runner/_work/_temp/cached-feature-flags.json
  ##[debug]Feature 'extract_to_toolcache' undefined in API response.
  ##[debug]Feature extract_to_toolcache is disabled due to its default value.
  Downloading CodeQL tools from https://git.company.com/api/v3/repos/github/codeql-action/releases/assets/5565 . This may take a while.
  Streaming the extraction of the CodeQL bundle.
  ##[debug]Extracting to /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244. Input stream has high water mark 4194304.
  tar -x --zstd --warning=no-unknown-keyword --overwrite -f - -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244
  tar (grandchild): zstd: Cannot exec: No such file or directory
  tar (grandchild): Error is not recoverable: exiting now
  tar: Child died with signal 13
  tar: Error is not recoverable: exiting now
  ##[debug]Cleaning up extraction destination directory.
  ##[debug]Cleaned up extraction destination directory.
  Warning: Failed to download and extract CodeQL bundle using streaming with error: Error while downloading and extracting tar: Error: write EPIPE
  Warning: Falling back to downloading the bundle before extracting.
  ##[debug]Cleaning up CodeQL bundle.
  Warning: Failed to clean up CodeQL bundle: no files found matching /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244.
  ##[debug]Downloading https://git.company.com/api/v3/repos/github/codeql-action/releases/assets/5565
  ##[debug]Destination /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a
  ##[debug]set auth
  ##[debug]download complete
  Finished downloading CodeQL bundle to /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a (11.1s).
  Extracting CodeQL bundle.
  ##[debug]Extracting to /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244.
  tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244
  tar (child): zstd: Cannot exec: No such file or directory
  tar (child): Error is not recoverable: exiting now
  tar: Child returned status 2
  tar: Error is not recoverable: exiting now
  ##[debug]Cleaning up extraction destination directory.
  ##[debug]Cleaned up extraction destination directory.
  ##[debug]Cleaning up CodeQL bundle archive.
  ##[debug]Cleaned up CodeQL bundle archive.
  Error: Unable to download and extract CodeQL CLI: Failed to run "tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244". Exit code was 2 and last log line was: n/a. See the logs for more details.
  
  Details: Error: Failed to run "tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244". Exit code was 2 and last log line was: n/a. See the logs for more details.
      at ChildProcess.<anonymous> (/home/runner/_work/_actions/github/codeql-action/v3.28.1/lib/tar.js:171:28)
      at ChildProcess.emit (node:events:519:28)
      at ChildProcess._handle.onexit (node:internal/child_process:294:12)
  ##[debug]Running git command: git rev-parse HEAD
  ##[debug]Sending status report: {"action_name":"init","action_oid":"unknown","action_ref":"v3.28.1","action_started_at":"2025-01-21T17:23:55.998Z","action_version":"3.28.1","analysis_key":".github/workflows/codeql-golang.yml:analyze","commit_oid":"d8d1429bce66e76202d14b5cc22251b91dfaa91f","first_party_analysis":true,"job_name":"analyze","job_run_uuid":"0cde5708-94e4-46a6-80e2-deb7dfb95ff0","ref":"refs/pull/[10](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:10)4/merge","runner_os":"Linux","started_at":"2025-01-21T17:23:55.998Z","status":"aborted","steady_state_default_setup":false,"testing_environment":"","workflow_name":"codeQL-golang","workflow_run_attempt":2,"workflow_run_id":2678337,"actions_event_name":"pull_request","runner_available_disk_space_bytes":7419633664,"runner_total_disk_space_bytes":8589934592,"cause":"Unable to download and extract CodeQL CLI: Failed to run \"tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244\". Exit code was 2 and last log line was: n/a. See the logs for more details.\n\nDetails: Error: Failed to run \"tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244\". Exit code was 2 and last log line was: n/a. See the logs for more details.\n    at ChildProcess.<anonymous> (/home/runner/_work/_actions/github/codeql-action/v3.28.1/lib/tar.js:171:28)\n    at ChildProcess.emit (node:events:519:28)\n    at ChildProcess._handle.onexit (node:internal/child_process:294:[12](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:12))","exception":"Error: Unable to download and extract CodeQL CLI: Failed to run \"tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-87[13](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:13)-81909027bb0a -C /home/runner/_work/_temp/c2[14](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:14)6770-b178-4be5-9164-0a0e8345e244\". Exit code was 2 and last log line was: n/a. See the logs for more details.\n\nDetails: Error: Failed to run \"tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244\". Exit code was 2 and last log line was: n/a. See the logs for more details.\n    at ChildProcess.<anonymous> (/home/runner/_work/_actions/github/codeql-action/v3.28.1/lib/tar.js:171:28)\n    at ChildProcess.emit (node:events:519:28)\n    at ChildProcess._handle.onexit (node:internal/child_process:294:12)\n    at setupCodeQL (/home/runner/_work/_actions/github/codeql-action/v3.28.1/lib/codeql.js:[15](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:15)0:15)\n    at async initCodeQL (/home/runner/_work/_actions/github/codeql-action/v3.28.1/lib/init.js:55:97)\n    at async run (/home/runner/_work/_actions/github/codeql-action/v3.28.1/lib/init-action.js:[17](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:17)5:34)\n    at async runWrapper (/home/runner/_work/_actions/github/codeql-action/v3.28.1/lib/init-action.js:436:9)","completed_at":"[20](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:20)25-01-21T17:24:08.201Z","matrix_vars":"null","runner_arch":"X64"}
  ##[debug]Node Action run completed with exit code 1
  ##[debug]CODEQL_ACTION_FEATURE_MULTI_LANGUAGE='false'
  ##[debug]CODEQL_ACTION_FEATURE_SANDWICH='false'
  ##[debug]CODEQL_ACTION_FEATURE_SARIF_COMBINE='true'
  ##[debug]CODEQL_ACTION_FEATURE_WILL_UPLOAD='true'
  ##[debug]CODEQL_ACTION_VERSION='3.28.1'
  ##[debug]CODEQL_ACTION_WARNED_ABOUT_VERSION='true'
  ##[debug]JOB_RUN_UUID='0cde5708-94e4-46a6-80e2-deb7dfb95ff0'
  ##[debug]CODEQL_ACTION_INIT_HAS_RUN='true'
  ##[debug]CODEQL_ACTION_ANALYSIS_KEY='.github/workflows/codeql-golang.yml:analyze'
  ##[debug]CODEQL_WORKFLOW_STARTED_AT='2025-01-[21](https://git.company.com/seceng-devsecops-platform/company-ghas-k8s-operator/actions/runs/2678337/job/9853213#step:5:21)T17:23:55.998Z'
  ##[debug]CODEQL_ACTION_JOB_STATUS='JOB_STATUS_FAILURE'
  ##[debug]Save intra-action state persisted_inputs = [["INPUT_DEBUG","true"],["INPUT_LANGUAGES","go"],["INPUT_BUILD-MODE","manual"],["INPUT_CONFIG-FILE",".github/codeql-config.yml"],["INPUT_QUERIES","security-extended,security-and-quality"],["INPUT_EXTERNAL-REPOSITORY-TOKEN",""],["INPUT_TOOLS",""],["INPUT_TOKEN","***"],["INPUT_REGISTRIES",""],["INPUT_MATRIX","null"],["INPUT_DB-LOCATION",""],["INPUT_CONFIG",""],["INPUT_PACKS",""],["INPUT_SETUP-PYTHON-DEPENDENCIES",""],["INPUT_SOURCE-ROOT",""],["INPUT_RAM",""],["INPUT_THREADS",""],["INPUT_DEBUG-ARTIFACT-NAME",""],["INPUT_DEBUG-DATABASE-NAME",""],["INPUT_TRAP-CACHING",""],["INPUT_DEPENDENCY-CACHING",""]]
  ##[debug]Finishing: Initialize CodeQL

@hvitved
Copy link

hvitved commented Jan 22, 2025

@henrymercer : Is the log above sufficient for you to debug? I notice the line

##[debug]Could not find zstd: Error: Unable to locate executable file: zstd. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also check the file mode to verify the file is executable.

which suggests that we should be detecting that zstd is not present?

@henrymercer
Copy link
Contributor

Thanks for the debug logs @marcellodesales. #2710 should fix this issue. I'll let you know once this is available in a stable release — this should be ready by the end of the week.

marcellodesales added a commit to marcellodesales/runner that referenced this issue Jan 22, 2025
This is based on the problems reported at github/codeql-action#2705 and github/codeql-action#2400 where the base docker image doesn't include zstd compression tool. The error occurs running codeQL:

 Finished downloading CodeQL bundle to /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a (11.1s).
  Extracting CodeQL bundle.
  ##[debug]Extracting to /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244.
  tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244
  tar (child): zstd: Cannot exec: No such file or directory
  tar (child): Error is not recoverable: exiting now
  tar: Child returned status 2
  tar: Error is not recoverable: exiting now
  ##[debug]Cleaning up extraction destination directory.
  ##[debug]Cleaned up extraction destination directory.
  ##[debug]Cleaning up CodeQL bundle archive.
  ##[debug]Cleaned up CodeQL bundle archive.
  Error: Unable to download and extract CodeQL CLI: Failed to run "tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244". Exit code was 2 and last log line was: n/a. See the logs for more details.
  
  Details: Error: Failed to run "tar -x --zstd --warning=no-unknown-keyword --overwrite -f /home/runner/_work/_temp/ca3b4527-1a21-43d9-8713-81909027bb0a -C /home/runner/_work/_temp/c2146770-b178-4be5-9164-0a0e8345e244". Exit code was 2 and last log line was: n/a. See the logs for more details.
      at ChildProcess.<anonymous> (/home/runner/_work/_actions/github/codeql-action/v3.28.1/lib/tar.js:171:28)
      at ChildProcess.emit (node:events:519:28)
      at ChildProcess._handle.onexit (node:internal/child_process:294:12)

Why: it will drastically increase performance while downloading codeQL.

A fix was pushed to github/codeql-action#2710 but it hasn't been released. Just including zstd will guarantee to use the best compression tool other than tar.
@henrymercer
Copy link
Contributor

@marcellodesales The fix is now released as part of v3.28.3. I've asked in the other thread whether you be able to verify the fix by updating to the latest version of the CodeQL Action.

@DSmithVA Thanks again for bringing this to our attention, CodeQL Bundle v2.20.4 will ship with a reduced bundle size.

I'll close this issue.

@marcellodesales
Copy link

marcellodesales commented Jan 23, 2025

@henrymercer Thank you for providing it... I will verify this week and report back! I did create a PR in the runner project to get the base image with zstd for faster execution actions/runner#3670

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants