-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support precompiled library binaries #116
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #116 +/- ##
==========================================
+ Coverage 90.91% 93.78% +2.86%
==========================================
Files 3 9 +6
Lines 2257 3619 +1362
==========================================
+ Hits 2052 3394 +1342
- Misses 205 225 +20 ☔ View full report in Codecov by Sentry. |
I'm not a big fan of downloading prebuilt, how do you know it's built for the right platform? OS, arch, libc, etc... |
From a license POV if that tarball contains (L)GPL code you'll also have to provide script to rebuild it. |
The idea is to allow binaries to be specified using [package.metadata.system-deps.A.'cfg(target = "...")']
url = "..." If binaries are requested and there is not a prebuilt archive available for that target, we can always fall back to the default behaviour of
That would be something that the projects creating the tarballs would have to include in them, not something Supporting source code downloads is easy. However, I worry that there is too much complexity on the compilation part. Projects have many differences on their build systems and prerequisites, so we would have to find a way of accounting for that. I'm all for adding it as an option, but I think having prebuilts can be very useful in some situations, specially for users with low end hardware that can't afford the compilation time. And using prebuilts would always be an user configuration option in any case. |
That's indeed the problem of projects using this feature, not system-deps' problem. Maybe worth mentioning in system-deps doc?
Makes sense, that can at least help building for major platforms that has a prebuilt provided. Another question: What's the expected file tree contained inside the tarball? How do you typically generate such tarball? In the GStreamer case I imagine that's going to be a sysroot generated by cerbero? If that's the case, I believe you'll have the problem that .pc files it contains won't be relocatable, that means it will have |
BTW, Meson can now generate relocatable pc files, not sure cerbero uses that feature, and it won't work for non-Meson libraries anyway: https://mesonbuild.com/Pkgconfig-module.html#relocatable-pkgconfig-files. |
Yes, that is a good idea!
That's what the pkg_paths = [ "lib/x86_64-linux-gnu/pkgconfig", "lib/x86_64-linux-gnu/gstreamer-1.0/pkgconfig" ]
Another thing to make sure to specify in the docs is that this only works with relocatable
|
@eerii ok all good if cerbero uses |
cbe7515
to
41ecc6c
Compare
Yeah, we use |
I finally had time to read through the whole proposal. It's good and seems sound. Some comments:
Yes. The default should be The way this is usually solved, is that the prebuilt prefix ships with its own pkg-config, which has a value hard-coded in it for where the pc files are. This is obviously wasteful when you're dealing with several different platforms, and might not even work when cross-compiling (the bindir will only contain host binaries, not binaries for your build machine). There are two possible solutions:
I don't have a strong preference here.
I don't understand this question. "Outermost" values should override "inner" values.
Throw an error and let the user deal with it. It's always better to be strict, because it's easier to allow use-cases that weren't allowed previously, than disallow use-cases that have always worked incorrectly. |
Thank you so much for the thorough comments!
That makes sense. What I was thinking was using the same cfg directives that handle downloads for different platforms to specify the library paths for each binary: [package.metadata.system-deps.A.'cfg(target = "linux_x86...")']
url = "linuxdownload.tar.gz"
pkg_paths = [ "linuxlib/pkg-config" ]
[package.metadata.system-deps.A.'cfg(target = "windows_x86...")']
url = "windowsdownload.tar.gz"
pkg_paths = [ "windowslib/pkg-config" ] But I agree that this can quickly become too verbose and difficult to handle. Searching the prebuilt for
Sorry, I wasn't very clear with this. But yes, the idea is that outermost values override inner ones.
And yes, I have been working on ensuring that this is done correctly and analyzing the different package graph cases and which are conflicting. I'm almost done fixing the implementation to account for this, I'll upload the revised version later today. |
I think I prefer your approach of requiring the user to specify the paths instead of trying to guess tbh, we should also probably show that right from the start in the documentation. |
I think that's a property of the prefix (and hence the tarball). It doesn't make sense to put in the configuration, and it just makes it easy to get it wrong. That's why I suggested that the prefix/tarball could specify it in a |
For example, if you want to add support for a new target (say, something based on RISC-V, or NixOS), you don't need to update every -sys crate (and potentially every project that uses that -sys crate). You just build some new binary tarballs and everything works. |
You will have to anyway to add the URI to the new tarball, unless we add some templating logic, which could also make sense.
That approach is fine by me, I do not like autodetection. Also it could be added later fmopv as we already have a working solution. Also, what is the point of having the .pc files in the tarballs platform dependent? They are never meant to be used on a system wide prefix, but should be consumed by intermediary apps. I have the impression that we could default to |
0756f2d
to
d52c97b
Compare
I thought that was the plan already, the manylinux URIs look templated :)
That's how distros work. Libraries and executables built for a particular distro will look for that distro's' libdir in whatever prefix you use. We tried to not do this in cerbero already, and that failed. |
Interesting, how did that fail? |
I don't remember anymore. Something around how modules are searched. All I remember is that in cerbero we tried to enforce |
c6e2884
to
a3a40b7
Compare
Hi! Update on this, I have been refactoring the proposal, cleaning old changes and improving the interface. I think that it is now at a point where I'm pretty happy with the public API and how the parsing system works. There may be a few details to still work on, but I think it is a good place to start reviewing this. I improved the error handling and added tests for both the metadata parsing and the binary downloads and filetypes. Since there are complex dependency cases with multiple pacakges, each new test generates the needed I updated the PR description, but here is a list of the important changes: Improvements on metadata parsingThe naive previous implementation only worked for very simple package trees and it didn't fail properly on conflicts and sometimes it produced wrong results. I have reworked this part from scratch and it should hopefully be more robust and able to handle more generic cases. It is also reusable outside of First, the base package(s) is read from the root manifest. For each one, we read the metadata and descend into its (unvisited) dependencies, building an "upside down" dependency tree containing the metadata. Starting from the leaves we reduce the value applying the correct overwrites and checking for conflicts. The final result is a single map with the calculated values. It also has many little improvements, such as type checking for values, extending lists instead of overwriting them, using # info.toml
paths = [ "lib/pkgconfig" ] Reworked how prebuilt binaries are shared:The # New version
[package.metadata.system-deps.gstreamer_1_0]
url = "..."
checksum = "..."
provides = [ "glib_2_0", "gobject_2_0", "gio_2_0", "gst*" ]
# Before
[package.metadata.system-deps.gstreamer_1_0]
url = "..."
checksum = "..."
[package.metadata.system-deps.glib_2_0]
follows = "gstreamer_1_0"
[package.metadata.system-deps.gobject_2_0]
follows = "gstreamer_1_0"
# Many more lines ... Internally, it is actually adding the Additionally, wilcards are supported for provided packages. This is particularly useful for Downloading and decompressingThese actions should now be faster and more reliable. They now happen in parallel and only if there wasn't a valid version. Checksums are properly calculated and checked. The library for handling the downloads has been changed from Reorganize the proposal structureInstead of having three crates and everything all over the place, now the logic is on the
What's next?The current version now finally covers the use cases indicated in the proposal, works for projects with complex dependency trees, and it has tests and error handling. There are a few details missing that are important before merging:
There are other features that I think are more suited to a followup PR, since this is already big enough and they are not essential:
|
Thanks for all your work on this! I'll let @thiblahute and @nirbheek review the design and user API as they know much more than me about this use case. Btw, feel free to clean up the branch (squash commits, etc) as it will be required before merging. |
This is a work in progress and there are still open questions.
This PR attempts to support precompiled binaries for libraries. Projects using
system-deps
will be able to specify an url1 for downloading and unpacking an archive containing static libraries andpkg-config
descriptions. Its paths are added to thePKG_CONFIG_PATH
for each library. This makes it possible to use projects with-sys
crate dependencies without having to install system dependencies.Usage example
Before going into the technical explanation, here is a brief example on how the proposed configuration could work. We have a project
P
that depends on the crateA
, a safe wrapper for the bindings ofliba
, in crateA_sys
. Then,A
depends onB
, similarly a wrapper aroundB_sys
.To use prebuilt binaries for
B_sys
, we can add it in the same place we configure thesystem_deps
metadata.When building
P
,libb.tar.gz
will be downloaded. To findlibb
,system-deps
will instructpkg-config
to look first atDOWNLOAD_DIR/B/lib/pkgconfig
, and if there is a valid library it will be linked. Users ofP
don't have to installlibb
.Let's say that
A_sys
also has available binaries. The configuration is the same as before. However, imagine that theA_sys
download already includes a version oflibb
and we want to use that instead. We can makeB_sys
use it like so:This time,
liba.tar.gz
will be downloaded and it will be used to link bothliba
andlibb
. Note that overriding options is only allowed from downstream crates, the outermost configuration will be used.Finally, if an user building
P
wants to use a local copy of the binaries instead of the provided one, or another url, it can overwrite yet again the configuration set byA_sys
:Previous work
Some crates in the rust ecosystem already provide some similar functionality. Two of the most relevant are
skia
andopenssl
, with different approaches.skia-bindings
to compile the source code in the CI and upload it to a release repository, with different versions for certain feature sets. For users, it first checks if there is a package that matches the version and features requested. If there is none, it falls backs to compiling the code locally.vendored
feature to control if it should use precompiled binaries. By default it queries the system for the needed libraries, but ifvendored
is enabled it usesopenssl-src
to download and build the source code.While these are great implementations, they are very specific to the crates they are in and they need certain prerequisites that may not work well for all libraries. For instance, they are working with a base crate with no other
-sys
dependencies that may also want binaries.Requisites
As
system-deps
is a general library, the proposal for using precompiled binaries needs to be as general as possible. The intended goal is to abstract the difficulty of managing the download and linking but still allow crates to configure when and how their binaries should be used. The only requirement is that they usepkg-config
files.It also needs to allow for multiple crates in the tree to request and link binaries, since many projects have a combination of libraries using
system-deps
. Additionally, some crates may want to share the same binary archive or may need to override the options of a dependency.The configuration should be as easy as possible for the end user. Ideally, the sys crates should provide the default binary location and the projects using would be able to use it without further configuration. It needs to be possible to specify different downloads for each target and version of the library.
Extra care needs to be taken to avoid unnecessary recompilation or downloads. Using precompiled binaries is specially useful for less powerful systems since it avoid extra compute work, so it is important to optimize the solution.
Configuration
The main issues when developing this implementation were related to dependency ordering and metadata reading. Cargo is very strict with some of its features, and communicating between crates during compilation is difficult.
system-deps
runs in the build script of the system libraries, once for each dependency that uses it in the project.Reuse current system
We need a way of specifying a set of attributes (url, checksum, paths, ...) for each binary archive we want to download.
system-deps
already does something similar with cargo metadata, with one caveat: It only reads the manifest of the crate that it is being compiled. For this use case this is too limiting, since it would mean that every dependency would have to specify its own binary url and that downstream projects would have no way of overriding this behaviour later. It also doesn't allow sharing binary archives across crates, since the downloads would be ran in parallel.Environment variables
Using environment variables is also not enough. While we could ask the user to set overrides for certain crates (
SYSTEM_DEPS_BINARY_CRATE_NAME_URL
,SYSTEM_DEPS_BINARY_CRATE_NAME_PKG_PATHS
, ...), this is too verbose, forces the user to add configuration, and it misses one feature: intermediate dependencies overriding their dependencies. Even if we supported adding environment in the build script (#114), we can't do that in middle crates since they are actually running a different instance ofsystem_deps
.Patch dependency
A previous prototype tried to "inject" a dependency before
system-deps
ran. This was used as its configuration provider. Since we needed something that ran before everysystem_deps
instance, we added a dependency tosystem-deps
that could be patched by the caller. This worked because it didn't break the dependency build order, but it was very hacky and it required that users added a[patch]
pointing to the correct subcrate since that directive has to appear in the top level manifest. Additionally, this didn't allow configuration to come from multiple crates and it was not straightforward to write.Read metadata from project root
Finally, we arrived at the current solution. If we manage to obtain the top level project directory (the one that has the outmost
Cargo.toml
), we can recursively search the dependency tree from there and read the metadata of its crates.There is an ongoing discussion on how should cargo provide the workspace root of the project (rust-lang/cargo#3946). Other crates such as
bevy
andtoml-conf
have a similar use case for it as we do. Even if there is no official method to do so at the moment (which we would really appreciate), we found a reliable alternative, inspired by whattoml-conf
does.Build scripts get passed one argument that can be read using
std::env::args
. This points to the build directory they have assigned in the root target directory. This can be different than readingOUT_DIR
, which is a target directory in the folder where the dependency is cloned (/project/dir/target
vs.cargo/checkout/dep/target
). Because that directory is inside the project, we can runcargo locate-project
to get the workspace root, which contains the top levelCargo.toml
manifest. This functionality is implemented inmeta/build::manifest
, more details on the design later.It works by default in most cases: On different compilation profiles, if the target directory is renamed, using dependencies from crates.io, git and local paths. The only instance in which we found that we can't find it is
TARGET_DIR
is explicitly set to a folder that is not a subdirectory of the top level project (understandably). We provide theSYSTEM_DEPS_MANIFEST
variable that allows to set theCargo.toml
file that will be used. In the rare cases where it fails to locate the project root, it shows the user how to set this variable, which can be as easy asSYSTEM_DEPS_MANIFEST=$(pwd)/Cargo.toml
or writing[env] SYSTEM_DEPS_MANIFEST = { value = "Cargo.toml", relative = true }
to.cargo/config.toml
.Ideally Rust could provide this path (or at least some way of defining global metadata) so we wouldn't need to do this. However, we feel like this is a good solution that works with minimal user configuration, and the worst case scenario is setting a variable. We could also support alternative ways of providing the binary config, like environment variables as mentioned before.
Design
The final design for the proposed changes is divided in two modules.
Metadata and binaries
The
system-deps-meta
crate has two purposes:First, it's build script finds the root manifest as we discussed before. This needs to happen in a build script because cargo passes different arguments to libraries and build scripts, and the path we need is only passed for the latter. Additionally, it also exports a target directory (defaults to it's
OUT_DIR
) where the binaries will be stored. This also needs to be defined in the build script because all of the libraries depending onsystem_deps
need to have access to the same binary directory to share binaries, so we need to compile it as a constant.The library part of the crate uses
cargo-metadata
to parse the root manifest and recursively search the dependency tree for metadata. It returns a table with all of the metadata values related tosystem-deps
. The merging function needs some improvements, specially to make sure that only downstream crates can overwrite the previously set metadata, and to handle multiple dependents on the same crate.The good thing is that this crate can be used from other dependencies to also query the workspace metadata without having to write complex logic. For example, we tested it as a method of specifying which plugins to link in
gstreamer
using the[package.metadata.gstreamer-plugins]
manifest key, following the same rules as before.System deps and pkg config
The
system-deps
build script (in a previous version this was in another separate crate) handles the downloading, unpacking and linking of the binary archives. It almost exclusively happens during its build script for two reasons: The binary download needs to happen before any crate that usessystem_deps
gets built, and so that it only happens once.Finally, in
system-deps
, when we are building each library, we get the paths to the folders that contain the.pc
files from the downloaded files. We modify the environment in the call topkg_config
prepending these paths to thePKG_CONFIG_PATH
so that the libraries from the binaries are prioritized in the search. The.pc
files must use a relative prefix so that it works inside of the extracted folder, but this is a common option in meson and other build systems.Build process diagram
Since explaining how the build dependencies work can be a mouthful, I think this diagram can be useful to understand which crates are compiled before which. The order is from right to left. Note that the only guarantee is that dependencies are compiled before dependents, and that the build script goes before the library, but the actual order can vary for each run.
Motivation
We are working towards adding prebuilt binaries for
gstreamer-rs
. It has multiple system libraries, and it depends on external ones such asglib-sys
. Our goal is for the project to provide the official prebuilt binaries so that other crates can use them directly, by only having to enable thebinary
feature (similar toopenssl
usingvendored
). This proposal started as agstreamer-rs
only project, and went through quite a bit of iterations, but we realized some of the ideas were generalizable to other crates usingsystem-deps
.I still need to rebase, clean and document the
gstreamer-rs
part of the implementation, so it might be messy at the moment, but the development is happening in this branch.TODO:
system-deps
does.binary
andmeta
.Followups
cargo-metadata
with a similar implementation based onserde-toml
so that deserialization libraries are not duplicated. It would make sense to move the current logic frommetadata.rs
tometa
.Footnotes
Even if this mentions "download" and "url", local files or folders are allowed using links that start with
file://
. We took inspiration from skia for this one. ↩