-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI][Python] Improve vcpkg caching #43951
Comments
It's a bit tricky to do this within docker but should be doable, there is a similar issue open for java-jars cc @danepitkin |
FWIW got vcpkg caching working within cibuildwheel here: It mostly seems to depend on passing:
into the container |
And also |
By the way, it seems other sources of binary artifacts are supported: |
Also, |
We would also consider building all wheels for the various Python versions in a single build (which is what typically happens when eg using cibuildwheel). It would make a single build longer of course, but reduce the overall CI time. Now, our pyarrow build and test run take quite a while, so maybe this will get too long for a single build |
It's quite bad for developer productivity to make the wheel build slower. I would rather we make the existing builds faster. Currently, when the vcpkg step runs, a manylinux wheel build run takes 1h15. When the vcpkg step is cached in the Docker image, a manylinux wheel build run takes 20 minutes. vcpkg binary caching would hopefully achieve similar results (probably not as good, but still). |
Of which half is setting up the image and building Arrow C++, which also strictly does not need to be repeated for every Python version. But yes, if there is a build failing for a specific Python version, it would be annoying they are all combined and you couldn't easily trigger a single Python version (it's always a trade-off) |
Building Arrow C++ (and perhaps PyArrow) could be made faster using ccache/sccache. Apparently that's not the case currently: |
We could consolidate all wheels into a single workflow (or one per os), where arrow C++ and deps are built once with best possible caching and the artifacts distributed to multiple wheel build jobs, I think this would be the best compromise of over all runtime and efficent use of CI time. |
Thinking back on last few releases I think mostly all wheel jobs fail together vs. issues with a specific action, excluding maybe RCs of new python versions. |
That would also make local reproduction using |
Also, perhaps there could be several sources: a GHA one and a file-based one as fallback (if not on GHA?). Something like: |
Implemented: #44644 (review) It uses manylinux and java-jar jobs use NuGet + GitHub Packages based cache. |
### Rationale for this change We're using only Docker level cache for vcpkg used for wheels. If we have any vcpkg related changes, all vcpkg ports are rebuilt. It's time consuming. ### What changes are included in this PR? Enable NuGet + GitHub Packages based cache. It's port level cache. So we don't need to rebuild all ports when we have any vcpkg related changes. See also: https://learn.microsoft.com/en-us/vcpkg/consume/binary-caching-github-packages NuGet + GitHub Packages based cache isn't enabled with manylinux2014 + aarch64. Because EPEL for CentOS 7 + aarch64 provides old Mono. (FYI: EPEL for CentOS 7 + x86_64 provides newer Mono.) We can't use old Mono to run NuGet on Linux. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #43951 Lead-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Raúl Cumplido <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Issue resolved by pull request 44644 |
Thanks a lot @kou ! |
Describe the enhancement requested
We use vcpkg to build bundled dependencies for Python wheels. Unfortunately, it often happens that the Docker image gets rebuilt, and therefore all the dependencies are recompiled from scratch. This makes build times very long (random example here).
It would be nice to use a vcpkg binary cache on CI here, especially as we always build the same dependency versions regardless of the targeted Python version. There are exemples here: https://learn.microsoft.com/en-us/vcpkg/consume/binary-caching-github-actions-cache
Component(s)
C++, Continuous Integration, Python
The text was updated successfully, but these errors were encountered: