-
-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve UX with CA derivations #875
base: master
Are you sure you want to change the base?
Conversation
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/tweag-nix-dev-update-8/11758/1 |
e0b9c88
to
74eb4e4
Compare
4cb5eee
to
ff0a9b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works!
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/tweag-nix-dev-update-10/12587/1 |
t/content-addressed.t
Outdated
ok(evalSucceeds($jobset), "Evaluating jobs/content-addressed.nix should exit with return code 0"); | ||
ok(nrQueuedBuildsForJobset($jobset) == 3 , "Evaluating jobs/content-addressed.nix should result in 3 builds"); | ||
|
||
for my $build (queuedBuildsForJobset($jobset)) { | ||
ok(runBuild($build), "Build '".$build->job."' from jobs/content-addressed.nix should exit with code 0"); | ||
my $newbuild = $db->resultset('Builds')->find($build->id); | ||
my $expected = $build->job eq "fails" ? 1 : $build->job =~ /with_failed/ ? 6 : 0; | ||
ok($newbuild->finished == 1 && $newbuild->buildstatus == $expected, "Build '".$build->job."' from jobs/content-addressed.nix should have buildstatus $expected"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out the other test files: there are other helpers like is
which, when the test fails, prints the actual/expected values. Also, please don't check two facts at once (ie: don't use &&
in a check.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh indeed. This test was actually a copy/paste of the “basic” test, but it didn't follow the overhaul of the test infrastructure.
Fixed in a12704f
t/jobs/content-addressed.nix
Outdated
mkDerivation = args: cfg.mkDerivation ({ | ||
__contentAddressed = true; | ||
outputHashMode = "recursive"; | ||
outputHashAlgo = "sha256"; | ||
} // args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to extend config.nix.in (and regenerate with make) with more functions, like makeContetAddressedDerivation :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Done in 8ef36cc
src/lib/Hydra/Controller/Build.pm
Outdated
# XXX: If ca-derivations is enabled then this will always return false | ||
# because `$_->path` will be empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this always true, or is it true if it is set AND the derivation is ca-addressed? This feels like a fairly spooky change to the data model of Hydra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mh… I see what you mean. It's only true if the derivation is ca. I've updated the comment in ca86783 to make it clearer.
This feels like a fairly spooky change to the data model of Hydra.
It is indeed 😃 . But the path
must be nullable, because of the very nature of CA derivations
I should have done this as a review, sorry. In the C++ I'm seeing a lot of in-place complication. Would it be possible to clean up the campsite a bit, and make some functions and break up the behavior a bit? I'm increasingly uncertain about understanding those pieces of the code. |
I also wonder if there would be anything interesting in tests for:
|
Thanks for the review @grahamc
A large part of the complication is annoyingly due to the fact that we need to support both the nix+ca-derivations and nix-without-ca-derivations cases, which means two slightly different code paths. But it should be indeed possible to factor at least part of it out.
I’d have to think it a bit more, but I think there's a number of things worth testing here indeed. |
basicDrv.inputSrcs.insert(*outPath); | ||
BasicDerivation basicDrv; | ||
auto outputHashes = staticOutputHashes(*localStore, *step->drv); | ||
if (auto maybeBasicDrv = step->drv->tryResolve(*localStore)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: Reading through the code again, I'm actually wondering whether this isn't messing-up the build of non-ca derivations (because tryResolve
changes the output paths, so applying it unconditionally might do some funny things). I'd expect this would cause a test failure (or a hydra crash), but it's worth double-checking first
I've added a bunch of tests (and fixed a bunch of related issues by the way). I didn't test
because
|
The test hydra instance just showed some weird failures with fixed-output derivations (https://hydra.ngi0.nixos.org/build/171 − the cbindgen vendor derivation should build just fine but is marked as failed without anything appearing in the log). This needs a bit of investigation − Might be linked to #875 (comment) because maybe that bit of code also rewrites the output hashes of FO derivations. |
OK, looks like these are “legitimate” has mismatches. So the only issue is that the error isn’t properly reported − which I think is a Nix issue |
It would be really great if whatever the underlying issue is has an
accompanying test wherever the fix belongs :) these sort of mystery
problems are exhausting 😔
On April 29, 2021, GitHub ***@***.***> wrote:
> The test hydra instance just showed some weird failures with fixed-
> output derivations (<https://hydra.ngi0.nixos.org/build/171> − the
> cbindgen vendor derivation should build just fine but is marked as
> failed without anything appearing in the log). This needs a bit of
> investigation − Might be linked to #875 (comment)
> <#875 (comment)>
> because maybe that bit of code also rewrites the output hashes of FO
> derivations.
>
OK, looks like these are “legitimate” has mismatches. So the only
issue is that the error isn’t properly reported − which I think is a
Nix issue
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#875 (comment)>, or
unsubscribe <https://github.com/notifications/unsubscribe-
auth/AAASXLE7UTSXLSFPVILDEATTLEYFRANCNFSM4YCDMVTQ>.
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/tweag-nix-dev-update-11/12886/1 |
Possible issue with remote builders and hydra. (note: there should be no CA builds occurring)
Getting errors like:
Reverting to |
We’ve also started getting the same on https://hydra.ngi0.nixos.org/build/202 . That might (quite ironically) be related to d7a1b6b |
Running this branch now with ca-derivation builds, but getting errors in the UI like this: Not sure if that's to be addressed in a future patch or if I hit some edge case. Edit: I made a new jobset and now it seems solved (for that new jobset). |
@tomberek can you try 854dcec ? I think it should fix your issue. @Mindavi there were some known issues with earlier commits in this branch where the database wasn’t properly filled with all the required informations, causing this kind of issue (and which persisted after an upgrade because the incomplete data was still there). Any chance that’s the case for you? |
I guess I broke it with trying to build with ca-derivations first and only then finding out hydra doesn't support it and applying thid patchset. It seems to be ok now though, with a fresh jobset, apart from the (already expected) ui artifacts (notably, empty package names). |
This has now been running pretty smoothly for a while for me. However, there's one thing that's now starting to happen:
I already have systemd built locally, but it appears to be trying to substitute it anyways? And the remote appears to have another hash as well. I'm not sure what's happening here, but it shouldn't be trying to substitute it, I guess. This has happened for some different packages over the last few days, e.g. also for glibc. I have the binary cache for the ngi0 machine that's building the ca derivations enabled, I guess those might also have something to do with this? Versions:
|
t/jobs/dir-with-file-builder.sh
Outdated
#! /bin/sh | ||
|
||
# Workaround for https://github.com/NixOS/nix/pull/6051 | ||
echo "some output" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I read upstream correctly, this workaround shouldn't be needed anymore.
Could it be that we are missing the remote builder check implemented here https://github.com/NixOS/nix/pull/4905/files#diff-e89a2224d1d4c39bf5a2d448b7b2fac1e3e75c7e08e414ee448846266c8bfeab ? Because I am experiencing that hydra tries to use remote builders for ca-derivations that do not have that feature enabled in the daemon or the machines file. Edit: Hydra reported the following error The builder it tried to run the build on is the correct system but doesn't support that feature. When building manually it is confirmed, that the derivation requires ca-derivation. Why did hydra schedule that on that build?
Edit2: I really don't have much of clue but I think the error is here https://github.com/thufschmitt/hydra/blob/master/src/hydra-queue-runner/state.hh#L281C18-L281C35 My clues are that either the check code is wrong or the steps don't have the required step set correct. Other machines with the wrong arch log this line https://github.com/thufschmitt/hydra/blob/master/src/hydra-queue-runner/dispatcher.cc#L250C28-L250C35 Edit3: I think I found the difference in the hydra code https://github.com/thufschmitt/hydra/blob/nix-ca/src/hydra-queue-runner/queue-monitor.cc#L464-L473 and https://github.com/NixOS/nix/blob/master/src/libstore/parsed-derivations.cc#L96-L97 Ericson you are so right, that this code needs to be deduplicated which would fix a ton of bugs like this one. |
I think SuperSandro2000@27b1935 solved that issue but I am not yet 100% sure. Edit: I am not sure actually. |
NixOS/hydra#875 is the next thing to test. Flake lock file updates: • Updated input 'hydra': 'github:NixOS/hydra/d45e14fd4381d1476a606c20ef3573215067472c' (2024-01-24) → 'github:thufschmitt/hydra/002a77a954259ddb49fd262607ba44fb2c3b34a9' (2024-01-24)
Per NixOS/hydra#838 we want to test that NixOS/hydra#875 works both with and without it enabled, so we can deploy without it first.
(For anyone else reading this, the scheduling issues @SuperSandro2000 brings up are now fixed on |
Renamed because since I factored out #1349, what is left just in this PR is just UI changes (and the extra column that those UI changes leverage). |
nix_2_21 is unhappy with this PR/hydra in generell:
|
I think currently (at least) a database migration is missing here. |
NixOS/hydra#875 is the next thing to test. Flake lock file updates: • Updated input 'hydra': 'github:NixOS/hydra/d45e14fd4381d1476a606c20ef3573215067472c' (2024-01-24) → 'github:thufschmitt/hydra/002a77a954259ddb49fd262607ba44fb2c3b34a9' (2024-01-24)
Per NixOS/hydra#838 we want to test that NixOS/hydra#875 works both with and without it enabled, so we can deploy without it first.
Not sure if I ever confirmed this, but lately I have noticed that sometimes builds couldn't be scheduled because the feature was missing, so from my perspective things seem to work as expected 🎉 |
When building packages and system configurations with this PR, it all seems to work with and without content-addressed derivations.
Note: The derivation
So the derivation, as well as the output, is present. Restarting the build (or wait for the queue-runner to do it again), results in the exact same error message. So does repeating or restarting the build. It it impossible to build the system config right now with hydra. This happens on the The only related issue I found was NixOS/nix#6700 which did happen to me i the past, but got fixed later on. I also don't get the The nix-daemon.service doesn't report any errors either, which is why I think it has something to do with how hydra build the CA derivations (ergo this and the related PR's) Also, I already did build the system configuration with hydras own |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/content-addressed-nix-call-for-testers/12881/227 |
Small update. I have been running hydra with this PR for the past month. After I uploaded the (for some reason) broken
everything worked just fine. I had to do this manually a few times in the beginning and everything has been running smoothly since then (with CA nixos configs, normal configs and package sets) Maybe there was a hickup with the local store or something, I don't know. |
FYI: The main thing this PR does is give a better UI, all other ca-derivations support is already part of master. |
Fix #838
There should be a couple of UI artifacts (because the output paths were showing up in a couple places but we don't know them anymore), but everything seems to be working fine otherwise (I managed to build a couple generations of the
nixpkgs-minimal
jobset with this).Depends on #874 (could probably be made to work without it, but I built it on top of it and the effectively touch the same code)Also depends on NixOS/nix#4477Depends on #1349