Splitting rust_toolchain
into multiple toolchains
#815
Replies: 6 comments 7 replies
-
I've got a draft PR (#800) of this open if anyone needs something more concrete than what I've written above. Please note it's just a draft and I'll only be able to retain high level feedback since it's still fairly volatile. Depending on how that issue goes, I may re-open it under a new PR to make it easier to review after some feedback/changes. |
Beta Was this translation helpful? Give feedback.
-
Therefore I propose either or both of:
|
Beta Was this translation helpful? Give feedback.
-
I'm postulating that toolchain authors and rules authors are often enough not the same people, and for rules authors what you propose is a regression. For maintainers of the rules_rust your proposal (and your draft implementation) incur a lot of complexity that is on you to prove is warranted. You keep mentioning
What I read from this paragraph is:
I don't see a problem with pre-defining 120 toolchains as long as none of them re-downloads stuff that other toolchains downloaded, and none of them downloads unnecessary things. All can be achieved with a current
What do you mean by This to me is taking more advantage of toolchain resolution since the way the rules are defined now, we generally have a repository generated with all capabilities and a toolchain that only differs in flags needed by each target. Instead, we should use toolchain resolution to gate what artifacts need to be downloaded in order to perform a build, in addition to providing target specific flags (which is the part already being done). Ok I think this is where we don't understand each other. Please tell me why the following hypothetical repository layout doesn't work (using hypothetical so we don't get bogged down to current impl details): @rust_toolchains is a repository rule that defines 120 toolchain(
name = "rust_linux_x86_64",
exec_compatible_with = [
"@platforms//cpu:x86_64",
"@platforms//os:linux",
],
target_compatible_with = [
"@platforms//cpu:x86_64",
"@platforms//os:linux",
],
toolchain = "@rust_toolchain_x86_64//:rust_toolchain",
)
...
register_toolchains("//:all") Then in the rust_toolchain(
name = "rust_toolchain",
rustc = ["@rust_tools_linux//:rustc"],
rust_lib = ["@rust_stdlib_x86_64//:rust_lib"],
...
) With this layout, we only eagerly load the Because we have
We can define all the possible 120 toolchains just as well. No unnecessary data is downloaded.
I don't see why users would have to download any unnecessary stdlibs. In the example above I show how they only need to download what they need for the current build. Nothing more.
Are you proposing to remove clippy/rustfmt/cargo from the rust_toolchain? That is a different proposal than the one we are talking about. Please let's not make this one cover more ground than necessary. I'm happy to have a separate discussion for this.
I remain not convinced that this proposal will have any measurable effect on amount of downloaded data and build performance. I still think the biggest usability wins for the toolchain owners are to be held at our repository rules, not in the rust_toolchain. Looking at your draft PR I am skeptical that the split will simplify the implementation. |
Beta Was this translation helpful? Give feedback.
-
After a conversation with @hlopko I finally found the time to try a different interface.
rust_repository_set(
prefix = "some_prefix",
edition = "2018",
exec_triples = [
"x86_64-unknown-linux-gnu",
["aarch64-unknown-linux-gnu", "@platforms//cpu:aarch64"],
],
target_triples = [
"x86_64-unknown-linux-gnu",
["aarch64-unknown-linux-gnu", "@platforms//cpu:aarch64"],
["x86_64-unknown-linux-musl", "//custom:constraint"],
["aarch64-unknown-linux-musl", "@platforms//cpu:aarch64", "//custom:constraint"],
["i686-unknown-linux-gnu"]
],
rustfmt_version = "1.4.12",
sha256s = {
"rust-1.46.0-x86_64-unknown-linux-gnu": "e3b98bc3440fe92817881933f9564389eccb396f5f431f33d48b979fa2fbdcf5",
"rust-std-1.46.0-x86_64-unknown-linux-gnu": "ac04aef80423f612c0079829b504902de27a6997214eb58ab0765d02f7ec1dbc",
"rustfmt-1.4.12-x86_64-unknown-linux-gnu": "1894e76913303d66bf40885a601462844eec15fca9e76a6d13c390d7000d64b0",
},
version = "1.46.0",
) This approach allows users to pass a list or string to One thing that bothers me about this approach is that it has an uncommon API where a single attribute consumes multiple types of data and generates a different structure to have something that can be iterated over. I feel that by introducing any API that doesn't simply allow users to set exec and target constraints is unnecessary burden of knowledge and adds a step to debugging why the rules might not be doing what's expected after the normal toolchain resolution debugging. In the other variants I've attempted, they all come down to this issue. I'm still convinced that having separate execution and target toolchains is the right approach to allow for custom constrained toolchains without re-implementing toolchain resolution in repository rules or forcing users to copy/paste whole toolchain definitions. However, there could still be an approach I'm not seeing. |
Beta Was this translation helpful? Give feedback.
-
Some random thoughts:
|
Beta Was this translation helpful? Give feedback.
-
While I still think the toolchain should be split into |
Beta Was this translation helpful? Give feedback.
-
Overview
This discussion is inspired by the desire to do cross-compilation with the rules. There seem to be many issues related to this (#770, #276, #207). The goal is to take advantage of Bazel's toolchain resolution and allow the rules to, by default, cover more target platforms without incurring a cost to build time by downloading unnecessary components.
The Problem
Defining target toolchains is clunky.
Currently, it's difficult to define an appropriate toolchain that supports the desired target platforms due to the implementation of rust_toolchain.
rust_toolchain
is monlithic as it contains everything in a rust asset bundle (clippy
,cargo
,rustc
,rust-stdlib
,rustfmt
). The toolchain is typically defined by calling rust_repository_set which internally calls rust_toolchain_repository_proxy to produce a repository which containsThese then get registered here:
rules_rust/rust/repositories.bzl
Lines 307 to 313 in 7e7246f
The result is that there's now there's multiple nearly identical toolchains (in this case,
toolchain_for_x86_64-apple-darwin_impl
,toolchain_for_wasm32-unknown-unknown_impl
, andtoolchain_for_wasm32-wasi_impl
) which only differ slightly byrust_lib
and other fields like the*_ext
fields. This is not a huge issue when you're only going to be building from one host/exec platform but in the case your project supports multiple, it can be pretty uncomfortable to now manage multiplerust_repository_set
definitions with all the same target platforms.Solution
By breaking up
rust_toolchain
we allow for users to separately register their host/exec toolchains (which should contain things likerustc
andrustc_lib
) and their target toolchains (which should containrust-stdlib
).This can be achieved by splitting
rust_toolchain
into:rust_exec_toolchain
: containsrustc
,rustc_lib
, default edition, exec triple, version inforust_target_toolchain
: containsrust_stdlib
, extension defs, target triple, version infoEach new toolchain is then accompanied by a repository rule
rust_exec_toolchain_repository
andrust_target_toolchain_repository
. This allows for exec and target toolchains to only be defined once and relies more on Bazel's toolchain resolution which should result in interfacing with other rules easier and more intuitive (since they'd be sharing a similar interface).Notes
The thing that ultimately set me down this journey was a conversation in the slack channel with @ccontavalli and @jmillikin where @jmillikin shared a great blog post about a cross-compilation journey which I felt clearly outlined the issues but also showed a credible path forward. This is my interpretation of that path forward. I'm curious what others think or if there's a better way to make toolchain resolution more intuitive and suited for cross compilation.
And also to note, this doesn't directly add support for cross compilation but I think is critical in maintaining cross-compilation projects.
Looking forward to collaborating on making cross-compilation an easier thing!
Beta Was this translation helpful? Give feedback.
All reactions