Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib.systems: introduce toolchain, cc, and bintools attributes #365057

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

RossComputerGuy
Copy link
Member

@RossComputerGuy RossComputerGuy commented Dec 13, 2024

Things done

Replaces useLLVM, useArocc, and useZig with toolchain, cc, linker, and bintools attributes. This might not produce any rebuilds but we'll see. This has the advantage of preventing using${compiler} flags from colliding and not working correctly if we were to stack multiple pkgs* together.

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 6.topic: python 6.topic: rust 6.topic: windows Running, or buiding, packages on Windows 6.topic: stdenv Standard environment 6.topic: systemd 6.topic: lib The Nixpkgs function library 6.topic: llvm/clang Issues related to llvmPackages, clangStdenv and related labels Dec 13, 2024
@RossComputerGuy RossComputerGuy force-pushed the feat/toolchain-attrs branch 3 times, most recently from 7a3eca0 to 2a005fa Compare December 14, 2024 00:20
@RossComputerGuy RossComputerGuy force-pushed the feat/toolchain-attrs branch 2 times, most recently from 9c8b44b to f6da63d Compare December 14, 2024 00:41
@tomberek
Copy link
Contributor

The main idea and motivation is in: pkgs/stdenv/cross/default.nix

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/community-team-updates/56458/11

Copy link
Member

@emilazy emilazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this approach is correct. useLLVM is indeed a confusing mess that couples a bunch of things we probably don’t want to couple together, and that causes issues for Darwin. However, if we split it up – which I agree is a good idea – then what remaining meaning does toolchain have? If it represents things like use of libunwind, those should also be split up into their own fields. If, however, toolchain is just a convenience to set all the other variables as #365057 (comment) suggests, then it’s definitely incorrect for packages to be conditioning on its value – as by definition, a different toolchain value should have no meaning if all the other values are the same, and packages should instead condition on exactly what part of the platform they care about, which is the hard part of splitting up useLLVM.

@sternenseemann
Copy link
Member

I have to agree, it seems to me that we are replacing an ad hoc system with a better, but still ad hoc system that, however, is more difficult to test. The use* flags as ugly as they may be just add another way of bootstrapping the package set (per supported local/crossSystem combination). It has other problems of course, especially when we have a lot of such flags. What the necessity of the toolchain argument implies, is that you can't (yet) arbitrarily combine pieces of a toolchain. I would question whether even our own wrapper code is free of such assumptions.

What concerns me about the proposal presented here is that it doesn't achieve a clear separation between the parts it introduces:

  • toolchain overlaps with everything else, but can't be reduced to a shorthand for other flags. It implies linker, cc, bintools, but other code still will have to check toolchain. This is problematic and will invite expressions to check the incorrect value because in normally cases they are associated with each other.
  • bintools overlaps with linker. I appreciated this is future work, but then I'd at least enforce that the behavior matches the prescription, i.e. if we have conflicting bintools and linker, I'd fail for now.

If it represents things like use of libunwind, those should also be split up into their own fields.

I think another problem of our current approach with elaborate is that in principle every field can be set manually and be computed. Some fields should probably always be computed because they are a property of the platform or a property of the combination of platform and toolchain, so it would just not work to specify this manually. This is stuff like exceptions, runtime libs, libunwind etc. typically.

It doesn't make sense to remove such properties from the platform set necessarily because a) it's best that they are managed in a central place so expressions don't need to duplicate the logic and b) what may be computed for some platforms is configurable for others [citation needed] or may become configurable later [citation needed].

@emilazy
Copy link
Member

emilazy commented Dec 27, 2024

  • toolchain overlaps with everything else, but can't be reduced to a shorthand for other flags. It implies linker, cc, bintools, but other code still will have to check toolchain. This is problematic and will invite expressions to check the incorrect value because in normally cases they are associated with each other.

Right. What I’d like is that toolchain is only shorthand for other flags, and that all downstream packages check linker, cc, bintools, unwind, etc. It’s okay if we don’t make everything downstream perfectly orthogonal for now as long as we make an effort to check what the packages actually care about.

Though, really, once we do that, it seems that the utility of the toolchain shorthand is small enough – constructing custom cross package sets is not that common a use‐case – that we can simply dispense with it entirely.

  • bintools overlaps with linker. I appreciated this is future work, but then I'd at least enforce that the behavior matches the prescription, i.e. if we have conflicting bintools and linker, I'd fail for now.

Actually, I believe the correct thing for Darwin today is LLVM bintools but Apple linker. AIUI we use very little of cctools at this point and it’s basically an LLVM platform as far as bintools go.

@RossComputerGuy
Copy link
Member Author

However, if we split it up – which I agree is a good idea – then what remaining meaning does toolchain have?

Toolchain is covering more than just the cc, linker, and bintools. It's guarding previous checks for useLLVM as toolchain == "llvm". So it doesn't just specify what options are in use but as a simple definition for what toolchain we have in use.

  • It implies linker, cc, bintools, but other code still will have to check toolchain. This is problematic and will invite expressions to check the incorrect value because in normally cases they are associated with each other.

I can update the predicates to add fixed attributes which change based on the new attributes. Then it'll be easier to check. But I think having better documentation on what toolchain could be would also suffice. If people were to use a wrong value, it'd be immediately clear since there wouldn't be a change in behavior.

  • but then I'd at least enforce that the behavior matches the prescription, i.e. if we have conflicting bintools and linker, I'd fail for now.

I'm not sure I follow with the behavior. Some of the new behavior is in place but to get any further would require significant work and likely would have to go to staging. My thinking was to leave that change for later, especially since a lot of packages and stuff will require tuning.

@emilazy
Copy link
Member

emilazy commented Dec 27, 2024

Toolchain is covering more than just the cc, linker, and bintools. It's guarding previous checks for useLLVM as toolchain == "llvm". So it doesn't just specify what options are in use but as a simple definition for what toolchain we have in use.

That’s precisely the problem: we split things up, but don’t condition on the split things, so we still essentially have one monolithic coupled toolchain selector with the illusion of more specificity. “What toolchain? LLVM” is not an answer that makes sense if you can vary other components of the toolchain freely, and indeed this causes problems in practice now for Darwin. The only way to split it up that makes sense is to cover every bit of relevant variance as separate fields and ensure that all downstream packages look at those.

@RossComputerGuy
Copy link
Member Author

The reason it's like that is because I was running into infinite recursions and cases where things weren't applying correctly. Darwin uses the apple toolchain, not the llvm one. If Darwin tries using the llvm one then we get infinite recursions within libunwind and libcxx.

@emilazy
Copy link
Member

emilazy commented Dec 27, 2024

But the Darwin toolchain is actually almost entirely LLVM except for the linker. Those issues are what would need fixing to make this a good step forward, rather than just adding additional variables that aren’t actually independent because of bugs in Nixpkgs. If we want to split up the toolchain fields then nothing downstream of stdenv should be reading toolchain at all.

@RossComputerGuy
Copy link
Member Author

Do we want this PR to have mass rebuilds on Darwin but none on Linux? Also, I'm not familiar with how Darwin does things so changing out toolchain to use llvm will break a lot of things. To me, that feels out of the scope of this PR. My intentions with this PR was to just get started with adding the new attributes. In it's current state, it reflects how upstream operates. The only rebuild right now is just the manual which usually happens.

@RossComputerGuy
Copy link
Member Author

Another note is we'll likely have to change toolchain == "llvm" to toolchain == "llvm" && !isDarwin since things differ enough to warrant it.

@nix-owners nix-owners bot requested a review from zowoq December 27, 2024 15:42
@RossComputerGuy
Copy link
Member Author

I've got a log of the infinite recursion: https://gist.github.com/RossComputerGuy/d43df8fc4c3b6fe4adf8020db8c3a94a

@emilazy
Copy link
Member

emilazy commented Dec 27, 2024

Another note is we'll likely have to change toolchain == "llvm" to toolchain == "llvm" && !isDarwin since things differ enough to warrant it.

Again, there shouldn’t be any conditions on toolchain == "llvm" at all if we’re splitting toolchain up into multiple variables. The ways in which Darwin differs should be reflected in other variables like linker. If we just want to move away from booleans, then we could perhaps make toolchain an enum without introducing the new variables which will only cause correctness issues and confusion if toolchain is still the ultimate source of truth.

@RossComputerGuy
Copy link
Member Author

Again, there shouldn’t be any conditions on toolchain == "llvm" at all if we’re splitting toolchain up into multiple variables.

Then how do we get the same logic as useLLVM but without using that attribute? There's things which need changing that checking the CC or bintools isn't enough.

@emilazy
Copy link
Member

emilazy commented Dec 27, 2024

The question is what they need to check it for. If it’s for the unwinding library, we should have a field for that too. If it’s about the C++ library, we should have a field for that too. If we continue to treat “LLVM toolchain” as a monolithic thing, then we won’t be able to vary these separate fields coherently, and we’ll continue to have issues with Darwin, so introducing these new attributes brings no real benefit.

@RossComputerGuy
Copy link
Member Author

Ok, we'll probably need one for the C++ library, CC unwinder, and rtlib. Usually those are coupled with cc but can be different based on the toolchain in use.

@emilazy
Copy link
Member

emilazy commented Dec 27, 2024

IIRC there’s already checks like stdenv.cc.libcxx != null that let you condition on the C++ library. So not everything necessarily has to be a new platform field.

@RossComputerGuy
Copy link
Member Author

RossComputerGuy commented Dec 27, 2024

That is null on the toplevel stdenv outside of Darwin, doesn't help in every case.

@RossComputerGuy
Copy link
Member Author

Ok, toolchain is used less throughout. There's cases where I just don't know what to use so sometimes it's a isClang && !isDarwin. In many cases though, we're using attributes which are better describing the platform.

Copy link
Member

@emilazy emilazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll try to take a closer look at all the changed conditionals once we’ve settled on a design here; keeping my feedback to the lib changes for now.

Comment on lines +69 to +73
elaborate = systemOrArgs:
assert lib.assertMsg (systemOrArgs ? useLLVM == false) "elaborate cannot contain the deprecated useLLVM attribute";
assert lib.assertMsg (systemOrArgs ? useArocc == false) "elaborate cannot contain the deprecated useArocc attribute";
assert lib.assertMsg (systemOrArgs ? useZig == false) "elaborate cannot contain the deprecated useZig attribute";
let
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need a deprecation period for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably 25.11 would be good to drop?

Comment on lines +99 to +102
toolchain =
/**/ if final.isDarwin then "apple"
else if final.isFreeBSD || final.isOpenBSD then "llvm"
else "gnu";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don’t have a clear separation between inputs and outputs for system elaboration (something that I agree we should fix in general), and since this should just be shorthand for setting a handful of other values in a platform definition – something that isn’t done very often at all – I think we should just drop toolchain here, and specify cc, bintools, etc. manually. That will ensure that no code is conditioning on toolchain and keep things factored out, at the cost of a bit of extra verbosity in the rare case of someone is defining a new platform.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem I have with dropping toolchain is then there's no easy way to get what the entire toolchain is. There's still cases where we cannot just check the C++ stdlib, CC, or the bintools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I agree with #365057 (comment) that there is no such thing as “the entire toolchain”. The only way factoring out the components of what we currently gloss as a GCC toolchain or an LLVM toolchain or an Apple toolchain leads to a more consistent and flexible system is if we actually tease out the axes of difference. For instance, there is not really an “Apple” toolchain (at least not in Nixpkgs); Darwin uses a mostly LLVM toolchain with a few Apple components. You could want to build for Darwin using LLD (though that currently doesn’t work so well). You could want to build for Linux with Clang, libstdc++, and GNU binutils. Or for Linux with Clang, libc++, and mold.

So my answer to

There's still cases where we cannot just check the C++ stdlib, CC, or the bintools.

is that we have to enumerate those cases, figure out what the conditionals actually care about, and make sure those are represented orthogonally in the systems. If the fields are not orthogonal – if there’s a reason some of them must be tightly coupled – then we don’t gain anything by allowing them to vary independently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at cases where we have something like toolchain == "llvm" in this PR. Some things like checks don't work under the toolchain but works with regular clang. It would be difficult and time consuming to go through and figure out which option specifically triggers the behavior. However, it would be the best option. Unfortunately, the full fine grain control just isn't there. Plus, having a toolchain option makes it easier to specify what the set everything to like we had with using*.

Comment on lines +114 to +117
libcxx =
/**/ if final.toolchain == "llvm" then "libcxx"
else if final.toolchain == "gnu" then "libstdcxx"
else null;
Copy link
Member

@emilazy emilazy Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are hostPlatform.libcxx == "libcxx" and stdenv.cc.libcxx != null ever going to disagree? If not, we can probably drop this? (Edit: Actually I guess we probably need it to decide whether we’re using libc++ in the first place.)

If we keep it, we should probably give it a name that doesn’t overlap with one of the options, to avoid confusion. There’s also the distinction of libc++abi vs. libcxxrt even when using libc++ (which is relevant for FreBSD; cc @rhelmot).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are hostPlatform.libcxx == "libcxx" and stdenv.cc.libcxx != null ever going to disagree?

Yes.

nix-repl> legacyPackages.aarch64-linux.stdenv.cc.libcxx
null

nix-repl> legacyPackages.aarch64-linux.stdenv.hostPlatform.libcxx
"libstdcxx"

stdenv.cc.libcxx doesn't look like a reliable way to get the C++ stdlib.

If we keep it, we should probably give it a name that doesn’t overlap with one of the options, to avoid confusion. There’s also the distinction of libc++abi vs. libcxxrt even when using libc++.

So what would be a better name? cxx-std?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not too fussed about the name; cxxlib seems consistent with the other choices in this PR. (That result seems consistent to me, though? aarch64-linux doesn’t use libc++, so stdenv.cc.libcxx is null.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: lib The Nixpkgs function library 6.topic: llvm/clang Issues related to llvmPackages, clangStdenv and related 6.topic: python 6.topic: rust 6.topic: stdenv Standard environment 6.topic: systemd 6.topic: windows Running, or buiding, packages on Windows 10.rebuild-darwin: 1-10 10.rebuild-darwin: 1 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants