Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: unsupported dtype BF16 for op matmul #32

Open
kadirnar opened this issue Jan 8, 2025 · 15 comments
Open

Error: unsupported dtype BF16 for op matmul #32

kadirnar opened this issue Jan 8, 2025 · 15 comments

Comments

@kadirnar
Copy link

kadirnar commented Jan 8, 2025

I tested fp16, fp8, f4 models. However, I'm getting this error. Can you help?

@EricLBuehler
Copy link
Owner

@kadirnar can you please provide a mininmum-reproducible example?

@kadirnar
Copy link
Author

kadirnar commented Jan 8, 2025

Owner

diffusion_rs_cli --scale 3.5 --num-steps 50 dduf -f FLUX.1-dev-Q4-bnb.dduf 
2025-01-08T15:18:35.608729Z  INFO diffusion_rs_core::pipelines: loading from source: dduf file: FLUX.1-dev-Q4-bnb.dduf.
2025-01-08T15:18:35.608954Z  INFO diffusion_rs_core::pipelines: model architecture is: flux
2025-01-08T15:18:35.679229Z  INFO diffusion_rs_core::pipelines::flux: loading CLIP model
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:00<00:00, 1147.15it/s]
2025-01-08T15:18:35.922026Z  INFO diffusion_rs_core::pipelines::flux: loading T5 model
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 138/138 [00:00<00:00, 306.10it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 513/513 [00:04<00:00, 348.27it/s]
2025-01-08T15:18:42.869190Z  INFO diffusion_rs_core::pipelines::flux: loading VAE model
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 244/244 [00:00<00:00, 3129.41it/s]
2025-01-08T15:18:43.007140Z  INFO diffusion_rs_core::pipelines::flux: loading FLUX model
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2672/2672 [00:04<00:00, 498.33it/s]
2025-01-08T15:18:47.849220Z  INFO diffusion_rs_core::pipelines::flux: FLUX pipeline using a guidance-distilled model: true
◇  Height:
│  1024
│
◇  Width:
│  1024
│
◇  Prompt:
│  a woman
│
Error: unsupported dtype BF16 for op matmul

@EricLBuehler
Copy link
Owner

@kadirnar what feature flags did you use to install the CLI?

@kadirnar
Copy link
Author

kadirnar commented Jan 8, 2025

@kadirnar what feature flags did you use to install the CLI?

cargo install diffusion_rs_cli  --features cuda

I don't know Rust but I solved it by using GPT4 to create src and cargo files. I might have done the installation incorrectly.

Error Message

error: failed to run custom build command for `diffusion_rs_common v0.1.0`

Caused by:
  process didn't exit successfully: `/tmp/cargo-installzcx4CF/release/build/diffusion_rs_common-9ed7d75a0ac27c5a/build-script-build` (exit status: 101)
  --- stdout
  cargo:rerun-if-changed=build.rs
  cargo:rerun-if-changed=src/cuda_kernels/compatibility.cuh
  cargo:rerun-if-changed=src/cuda_kernels/cuda_utils.cuh
  cargo:rerun-if-changed=src/cuda_kernels/binary_op_macros.cuh
  cargo:info=["/usr", "/usr/local/cuda", "/opt/cuda", "/usr/lib/cuda", "C:/Program Files/NVIDIA GPU Computing Toolkit", "C:/CUDA"]
  cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
  cargo:rustc-env=CUDA_COMPUTE_CAP=90

  --- stderr
  thread 'main' panicked at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen_cuda-0.1.5/src/lib.rs:527:9:
  nvcc cannot target gpu arch 90. Available nvcc targets are [35, 37, 50, 52, 53, 60, 61, 62, 70, 72, 75, 80, 86, 87].
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
error: failed to compile `diffusion_rs_cli v0.1.0`, intermediate artifacts can be found at `/tmp/cargo-installzcx4CF`.
To reuse those artifacts with a future compilation, set the environment variable `CARGO_TARGET_DIR` to that path.

GPU: H100

@EricLBuehler
Copy link
Owner

EricLBuehler commented Jan 8, 2025

I don't know Rust but I solved it by using GPT4 to create src and cargo files. I might have done the installation incorrectly.

That commend is correct for a CUDA machine. Since you have an H100, bf16 should definently be supported.

I think I'm a bit unclear, though: it seems you were able to build and run, which is how you got the Error: unsupported dtype BF16 for op matmul. Regardless, this issue seems to result from the fact that your GPU's compute cap is 9.0 while your nvcc version is too old. Perhaps you can try to update nvcc?

Can you please run and let me know the output of:

nvcc -V
nvcc --list-gpu-code

@kadirnar
Copy link
Author

kadirnar commented Jan 8, 2025

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
sm_35
sm_37
sm_50
sm_52
sm_53
sm_60
sm_61
sm_62
sm_70
sm_72
sm_75
sm_80
sm_86
sm_87

@EricLBuehler
Copy link
Owner

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
sm_35
sm_37
sm_50
sm_52
sm_53
sm_60
sm_61
sm_62
sm_70
sm_72
sm_75
sm_80
sm_86
sm_87

Yes, can you please update nvcc to a major version of 12 (11.5 most likely does not support compute cap 9.0)?

@Aloso
Copy link

Aloso commented Jan 8, 2025

I'm having the exact same issue, but with the CPU backend (I don't have cuda, I installed with cargo install diffusion_rs_cli)

CPU: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics @ 16x 5.137GHz
GPU: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 18.1.8, DRM 3.59, 6.11.10-2-MANJARO)

@EricLBuehler
Copy link
Owner

@Aloso I think the root cause of your issue is different (@kadirnar seems to have an outdated of CUDA nvcc installed, the CPU backend doesn't support bf16 yet). I'm adding support for automatic selection in #33.

@kadirnar
Copy link
Author

kadirnar commented Jan 8, 2025

@EricLBuehler I updated the nvcc version.

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0
sm_50
sm_52
sm_53
sm_60
sm_61
sm_62
sm_70
sm_72
sm_75
sm_80
sm_86
sm_87
sm_89
sm_90

I ran it again and I'm getting the same error

cmd:

diffusion_rs_cli --scale 3.5 --num-steps 50 dduf -f FLUX.1-dev-Q4-bnb.dduf

Error Message:

2025-01-08T21:03:38.605977Z  INFO diffusion_rs_core::pipelines: loading from source: dduf file: FLUX.1-dev-Q4-bnb.dduf.
2025-01-08T21:03:38.630981Z  INFO diffusion_rs_core::pipelines: model architecture is: flux
2025-01-08T21:03:38.739132Z  INFO diffusion_rs_core::pipelines::flux: loading CLIP model
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:01<00:00, 206.79it/s]
2025-01-08T21:03:40.021552Z  INFO diffusion_rs_core::pipelines::flux: loading T5 model
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 138/138 [00:08<00:00, 59.80it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 513/513 [00:26<00:00, 3709.82it/s]
2025-01-08T21:04:08.905775Z  INFO diffusion_rs_core::pipelines::flux: loading VAE model
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 244/244 [00:00<00:00, 224.73it/s]
2025-01-08T21:04:09.905606Z  INFO diffusion_rs_core::pipelines::flux: loading FLUX model
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2672/2672 [00:33<00:00, 148.89it/s]
2025-01-08T21:04:43.615974Z  INFO diffusion_rs_core::pipelines::flux: FLUX pipeline using a guidance-distilled model: true
◇  Height:
│  1024
│
◇  Width:
│  1024
│
◇  Prompt:
│  a woman
│
Error: unsupported dtype BF16 for op matmul

@EricLBuehler
Copy link
Owner

@kadirnar I merged #33 which should report some useful information on CUDA, can you please install the CLI from source to use this latest version and let me know the output?

@Aloso I merged #33 which should hopefully resolve the issue that you are having! Can you please install the CLI from source to use this latest version? I'll let you know when the next release comes out with this feature (shortly).

@Aloso
Copy link

Aloso commented Jan 8, 2025

I'm now getting a different error:

Error: dtype mismatch in matmul_with_alpha, lhs: F16, rhs: BF16

@kadirnar
Copy link
Author

kadirnar commented Jan 8, 2025

@kadirnar I merged #33 which should report some useful information on CUDA, can you please install the CLI from source to use this latest version and let me know the output?

@Aloso I merged #33 which should hopefully resolve the issue that you are having! Can you please install the CLI from source to use this latest version? I'll let you know when the next release comes out with this feature (shortly).

 diffusion_rs_cli --scale 3.5 --num-steps 50 dduf -f FLUX.1-dev-Q4-bnb.dduf 
2025-01-08T22:02:29.239205Z  INFO diffusion_rs_core::pipelines: loading from source: dduf file: FLUX.1-dev-Q4-bnb.dduf.
2025-01-08T22:02:29.239404Z  INFO diffusion_rs_core::pipelines: model architecture is: flux
2025-01-08T22:02:30.184847Z  INFO diffusion_rs_core::util::auto_dtype: detected minimum CUDA compute capability 9
2025-01-08T22:02:30.287975Z  INFO diffusion_rs_core::util::auto_dtype: dtype selected is BF16.
2025-01-08T22:02:30.354903Z  INFO diffusion_rs_core::pipelines::flux: loading CLIP model
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:00<00:00, 2303.86it/s]
2025-01-08T22:02:30.445648Z  INFO diffusion_rs_core::pipelines::flux: loading T5 model
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 138/138 [00:00<00:00, 1186.43it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 513/513 [00:02<00:00, 1445.68it/s]
Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading cast_f32_bf16

@EricLBuehler
Copy link
Owner

@kadirnar

Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading cast_f32_bf16

I merged #34 which should fix this, can you please reinstall and try again?

@Aloso

Error: dtype mismatch in matmul_with_alpha, lhs: F16, rhs: BF16

This is confusing, I cannot reproduce the issue when I build for plain CPU on my machine after #34. I merged #34 which fixed a different bug, and now it runs on my machine. Can you please reinstall and try it again?

@kadirnar
Copy link
Author

kadirnar commented Jan 9, 2025

Build:

cargo install --path diffusion_rs_cli --features cuda

Error Message:

   Compiling zip v2.2.2
error[E0599]: no variant or associated item named `CUBLASLT_MATMUL_DESC_A_SCALE_POINTER` found for enum `cublasLtMatmulDescAttributes_t` in the current scope
   --> diffusion_rs_backend/src/cublaslt/matmul.rs:183:63
    |
183 |             Matrix::A => sys::cublasLtMatmulDescAttributes_t::CUBLASLT_MATMUL_DESC_A_SCALE_POINTER,
    |                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ variant or associated item not found in `cublasLtMatmulDescAttributes_t`
    |
help: there is a variant with a similar name
    |
183 |             Matrix::A => sys::cublasLtMatmulDescAttributes_t::CUBLASLT_MATMUL_DESC_BIAS_POINTER,
    |                                                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

error[E0599]: no variant or associated item named `CUBLASLT_MATMUL_DESC_B_SCALE_POINTER` found for enum `cublasLtMatmulDescAttributes_t` in the current scope
   --> diffusion_rs_backend/src/cublaslt/matmul.rs:184:63
    |
184 |             Matrix::B => sys::cublasLtMatmulDescAttributes_t::CUBLASLT_MATMUL_DESC_B_SCALE_POINTER,
    |                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ variant or associated item not found in `cublasLtMatmulDescAttributes_t`
    |
help: there is a variant with a similar name
    |
184 |             Matrix::B => sys::cublasLtMatmulDescAttributes_t::CUBLASLT_MATMUL_DESC_BIAS_POINTER,
    |                                                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

error[E0599]: no variant or associated item named `CUBLASLT_MATMUL_DESC_C_SCALE_POINTER` found for enum `cublasLtMatmulDescAttributes_t` in the current scope
   --> diffusion_rs_backend/src/cublaslt/matmul.rs:185:63
    |
185 |             Matrix::C => sys::cublasLtMatmulDescAttributes_t::CUBLASLT_MATMUL_DESC_C_SCALE_POINTER,
    |                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ variant or associated item not found in `cublasLtMatmulDescAttributes_t`
    |
help: there is a variant with a similar name
    |
185 |             Matrix::C => sys::cublasLtMatmulDescAttributes_t::CUBLASLT_MATMUL_DESC_BIAS_POINTER,
    |                                                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

error[E0599]: no variant or associated item named `CUBLASLT_MATMUL_DESC_D_SCALE_POINTER` found for enum `cublasLtMatmulDescAttributes_t` in the current scope
   --> diffusion_rs_backend/src/cublaslt/matmul.rs:186:63
    |
186 |             Matrix::D => sys::cublasLtMatmulDescAttributes_t::CUBLASLT_MATMUL_DESC_D_SCALE_POINTER,
    |                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ variant or associated item not found in `cublasLtMatmulDescAttributes_t`
    |
help: there is a variant with a similar name
    |
186 |             Matrix::D => sys::cublasLtMatmulDescAttributes_t::CUBLASLT_MATMUL_DESC_BIAS_POINTER,
    |                                                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

error[E0599]: no variant or associated item named `CUDA_R_8F_E4M3` found for enum `diffusion_rs_common::core::cuda_backend::mistralrs_cudarc_fork::cublaslt::sys::cudaDataType_t` in the current scope
   --> diffusion_rs_backend/src/cublaslt/matmul.rs:634:30
    |
634 |         sys::cudaDataType_t::CUDA_R_8F_E4M3
    |                              ^^^^^^^^^^^^^^ variant or associated item not found in `cudaDataType_t`

For more information about this error, try `rustc --explain E0599`.
error: could not compile `diffusion_rs_backend` (lib) due to 5 previous errors
warning: build failed, waiting for other jobs to finish...
error: failed to compile `diffusion_rs_cli v0.1.0 (/home/ubuntu/kadir_dev/diff_opt/diffusion-rs/diffusion_rs_cli)`, intermediate artifacts can be found at `/home/ubuntu/kadir_dev/diff_opt/diffusion-rs/target`.
To reuse those artifacts with a future compilation, set the environment variable `CARGO_TARGET_DIR` to that path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants