Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build failure: python3Packages.jaxlibWithCuda #296737

Closed
UlyssesZh opened this issue Mar 17, 2024 · 27 comments · Fixed by #321559
Closed

Build failure: python3Packages.jaxlibWithCuda #296737

UlyssesZh opened this issue Mar 17, 2024 · 27 comments · Fixed by #321559
Labels
0.kind: build failure A package fails to build 6.topic: cuda Parallel computing platform and API 6.topic: python

Comments

@UlyssesZh
Copy link
Member

Steps To Reproduce

Build python3Packages.jaxlibWithCuda.

Build log

error: hash mismatch in fixed-output derivation '/nix/store/4jr37chpz7mf3h97lsv0309mmb17ng8d-bazel-build-jaxlib-0.4.24-deps.tar.gz.drv':
         specified: sha256-IEKoHjCOtKZKvU/DUUjbvXldORFJuyO1R3F6CZZDXxM=
            got:    sha256-h4zE+Z6z7odg7Avr54pgsjInBaHf+BqVQUi4SsV3Nqo=

Additional context

Because this hash was updated not long ago by #288857, I think the package may still need some refactor?

We may also expect #291705 to get everything settled.

Notify maintainers

@samuela @ndl

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

  • system: "x86_64-linux"
  • host os: Linux 6.1.81, NixOS, 23.11 (Tapir), 23.11.5353.878ef7d9721b
  • multi-user?: yes
  • sandbox: yes
  • version: nix-env (Nix) 2.18.1
  • channels(root): "nixos-23.11, nixos-hardware, nixos-unstable"
  • nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos

Add a 👍 reaction to issues you find important.

@UlyssesZh UlyssesZh added the 0.kind: build failure A package fails to build label Mar 17, 2024
@dotlambda dotlambda added 6.topic: python 6.topic: cuda Parallel computing platform and API labels Mar 17, 2024
@samuela
Copy link
Member

samuela commented Mar 18, 2024

Huh, weird. When I wrote #288857 that hash worked fine. I wonder what changed?

@UlyssesZh
Copy link
Member Author

Yes I was wondering about the same thing because I knew you updated that hash not long ago.

Does that hash depend on any online resources that may change or on other packages? If not, then this package may have some obscure reproducibility issues.

This package is so complex that I cannot see what it is doing there at glances.

@samuela
Copy link
Member

samuela commented Mar 18, 2024

Does that hash depend on any online resources that may change

Yes, but it is my understanding that its pulling resources that are hash-fixed in bazel-land, so iiuc they shouldn't change unless src changes... but then again clearly i have some misunderstanding of the behavior here

@GaetanLepage
Copy link
Contributor

python311Packages.jaxlibWithCuda builds fine for me on master.

@samuela
Copy link
Member

samuela commented Mar 25, 2024

@UlyssesZh Is this issue still reproducible for you?

@UlyssesZh
Copy link
Member Author

Yes. I just tested on master branch at commit e4a33b6. The exact same error.

@samuela
Copy link
Member

samuela commented Mar 25, 2024

Are you using any overlays? Have you set any nixpkgs config flags?

@UlyssesZh
Copy link
Member Author

I have a local clone of nixpkgs at ~/projects/nixpkgs, and I ran the command

NIXPKGS_ALLOW_UNFREE=1 nix build --print-build-logs --file ~/projects/nixpkgs python3Packages.jaxlibWithCuda

Does it depend on my nixos system config?

@samuela
Copy link
Member

samuela commented Mar 25, 2024

iirc it will depend on eg ~/.config/nixpkgs/ but i'm not sure about the nixos system config.

What commit of nixpkgs are you on? Does it reproduce on the latest master?

@UlyssesZh
Copy link
Member Author

I don't have ~/.config/nixpkgs. I said I was on e4a33b6, which is the latest.

@samuela
Copy link
Member

samuela commented Mar 25, 2024

Latest (at the time of writing) is 07262b1. does building on that commit work for you?

@UlyssesZh
Copy link
Member Author

That commit does not build either.

@lromor
Copy link
Contributor

lromor commented Mar 27, 2024

Hi, I'm also having the same issue with same hash mismatch. I tested the build on two different systems. If you happen to have a system in which the build runs fine, would you mind sharing the tar.gz output of the derivation:

/nix/store/spw4qzg6q9a419f3brv730szykkgfv5c-bazel-build-jaxlib-0.4.24.drv

so I can diff it? I could also send you mine.

@GaetanLepage
Copy link
Contributor

Should be fixed by #291705

@GaetanLepage
Copy link
Contributor

Fixed by #291705

@github-project-automation github-project-automation bot moved this from New to ✅ Done in CUDA Team May 13, 2024
@phiadaarr
Copy link
Contributor

Hi everyone,

sorry to reactivate this issue but if I check out the current master branch 219bc27 and run nix-build -A python3Packages.jaxlibWithCuda, I end up with the same error as above (just with different hashes).

Can anyone reproduce this behaviour?

@phiadaarr phiadaarr reopened this Jun 17, 2024
@github-project-automation github-project-automation bot moved this from ✅ Done to 📋 Backlog in CUDA Team Jun 17, 2024
@samuela
Copy link
Member

samuela commented Jun 17, 2024

just with different hashes

what hashes are you seeing now?

@GaetanLepage
Copy link
Contributor

I can rebuild the package just fine...

@phiadaarr
Copy link
Contributor

image
These are the hashes I get after running nix-build -A python3Packages.jaxlibWithCuda on the current master (3734012).

I am puzzled by the fact that this exact commands builds fine for @GaetanLepage . My expectation would have been that the behaviour should be the same if we are on the same platform (I am on x86_64-linux).

Do you have an idea how we can dig deeper into this?

@oddlama
Copy link
Contributor

oddlama commented Jun 21, 2024

I also just stumbled upon this issue and I do get the exact same hash mismatch as @phiadaarr :

error: hash mismatch in fixed-output derivation '/nix/store/kb6k4nhlkpjcmi2hdg398kgfl80rgkqx-bazel-build-jaxlib-0.4.28-deps.tar.gz.drv':
         specified: sha256-VGNMf5/DgXbgsu1w5J1Pmrukw+7UO31BNU+crKVsX5k=
            got:    sha256-vUoAPkYKEnHkV4fw6BI0mCeuP2e8BMCJnVuZMm9LwSA=

@GaetanLepage
Copy link
Contributor

GaetanLepage commented Jun 21, 2024

I also just stumbled upon this issue and I do get the exact same hash mismatch as @phiadaarr :

Well, my only hypothesis is that the bazel dependencies derivation is nothing more than non-deterministic :'(

@oddlama
Copy link
Contributor

oddlama commented Jun 21, 2024

@GaetanLepage Does this still build for you? If it does I can offer to send you an archive of my jaxlib with the changed hash so you can run a diff to find out what is wrong. (Or the other way round ofc)

@GaetanLepage
Copy link
Contributor

Does this still build for you

Yes, I have tried once again right now and it works fine...

If it does I can offer to send you an archive of my jaxlib with the changed hash so you can run a diff to find out what is wrong. (Or the other way round ofc)

Sure, Send me what you have :)

@oddlama
Copy link
Contributor

oddlama commented Jun 21, 2024

Alright, I've updated today and will build against nixpkgs c00d587b1a1afbf200b1d8f0b0e4ba9deb1c7f0e. My machine will probably take some time to completely build it though, so I'll post back once it's done.

@aryanjassal
Copy link

I'm getting a similar problem when I'm trying to compile jaxlibWithCuda.

[user@system:~/jax]# nix develop
warning: updating lock file '/root/jax/flake.lock':
• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/70bdadeb94ffc8806c0570eb5c2695ad29f0e421' (2024-01-03)
  → 'github:nixos/nixpkgs/a71e967ef3694799d0c418c98332f7ff4cc5f6af' (2024-06-22)
trace: warning: cudaPackages.autoAddDriverRunpath is deprecated, use pkgs.autoAddDriverRunpath instead
error: hash mismatch in fixed-output derivation '/nix/store/hsqlgbs9adm65rxxwwzdfp7lmg2i49dv-bazel-build-jaxlib-0.4.28-deps.tar.gz.drv':
         specified: sha256-VGNMf5/DgXbgsu1w5J1Pmrukw+7UO31BNU+crKVsX5k=
            got:    sha256-vAF5JwpINIp4pn1tFFl5059m/8/hn2cmEia04h9hHAw=
error: 1 dependencies of derivation '/nix/store/xza31n967qncpwinwyb62bf8a16c31vn-bazel-build-jaxlib-0.4.28.drv' failed to build
error: 1 dependencies of derivation '/nix/store/kl4wc1fl4jwl169f5zp79lqfjdf66nv8-python3.11-jaxlib-0.4.28.drv' failed to build
error: 1 dependencies of derivation '/nix/store/0xxkiqnajmcw1v7pb0grpnv827wn8h7i-jax-devshell-env.drv' failed to build

The commit a71e96 is from the nixos-unstable branch. The commit 70bdad was the only commit I found after going through multiple issues and threads which worked on my system.

I have also observed this issue when compiling tensorflowWithCuda (#317090), so the issue of build failures due to bazel hash mismatch isn't just limited to jaxlib. So, I don't think that this issue has been properly resolved yet.

@eightysteele
Copy link

I'm not sure if this is helpful--or noise!--but I just hit a similar issue with jaxlibWithCuda (see below for output, repro, and system info).

Output

INFO: Analyzed 2 targets (227 packages loaded, 20572 targets configured).
INFO: Found 2 targets...
INFO: Elapsed time: 153.583s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 0 total actions
buildPhase completed in 2 minutes 34 seconds
Running phase: installPhase
installPhase completed in 59 seconds
error: hash mismatch in fixed-output derivation '/nix/store/r52y4kivjn6jgwg9mf7xpsfcalc1p8j1-bazel-build-jaxlib-0.4.28-deps.tar.gz.drv':
         specified: sha256-vUoAPkYKEnHkV4fw6BI0mCeuP2e8BMCJnVuZMm9LwSA=
            got:    sha256-vAF5JwpINIp4pn1tFFl5059m/8/hn2cmEia04h9hHAw=
error: 1 dependencies of derivation '/nix/store/1sz9z9zjrrr232l4nvmqf252pzb00fsq-bazel-build-jaxlib-0.4.28.drv' failed to build
error: 1 dependencies of derivation '/nix/store/mkwl4813vvmb58qdqxv37v93lbc2yb7a-python3.12-jaxlib-0.4.28.drv' failed to build

Repro

eighty:λ nix-shell --pure
# shell.nix
let
  nixpkgs = fetchTarball
    "https://github.com/NixOS/nixpkgs/archive/b9ea3884e9a0c08e5c408bdd22f10eff9467d82d.tar.gz";
  pkgs = import nixpkgs {
    config = { allowUnfree = true; };
    overlays = [ ];
  };
  cudaPkgs = pkgs.cudaPackages_12_2;
  pythonPkgs = pkgs.python312Packages;

in pkgs.mkShell {
  packages = with pkgs;
    [
      cudaPkgs.cudatoolkit
      pythonPkgs.jaxlibWithCuda
    ];
}

System

eighty:λ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.9.3-76060903-generic, Pop!_OS, 22.04 LTS, nobuild`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.22.1`
 - channels(root): `"nixpkgs"`
 - nixpkgs: `/home/eighty/.nix-defexpr/channels/nixpkgs`

eighty:λ nix-channel --list
nixpkgs https://nixos.org/channels/nixpkgs-unstable

@samuela
Copy link
Member

samuela commented Jun 29, 2024

Btw we're tracking the hash issues in #321920.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: build failure A package fails to build 6.topic: cuda Parallel computing platform and API 6.topic: python
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

9 participants