Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HelloPyTorch CUDA Operator Test Fails on Windows #8549

Open
i-chiang opened this issue Jan 8, 2025 · 0 comments
Open

HelloPyTorch CUDA Operator Test Fails on Windows #8549

i-chiang opened this issue Jan 8, 2025 · 0 comments

Comments

@i-chiang
Copy link

i-chiang commented Jan 8, 2025

Environment:

  • OS: Windows 11
  • PyTorch version: 1.13.1+cu117
  • Python version: 3.7.12
  • CUDA version: 11.7
  • GPU models and configuration: NVIDIA GeForce GTX 1070, NVIDIA GeForce RTX 4060
  • Halide version: Release v19.0.0 (Halide-19.0.0-x86-64-windows)

I built add_generator.cpp to generate the corresponding static library, C header, and PyTorch wrapper using Visual Studio 2022.
I also added flags to setup.py to resolve compilation issues:

compile_args = ["/std:c++17", "/sdl-"]
link_args = ["/FORCE:MULTIPLE"]

When I run test.py, the Halide PyTorch CPU operator passes without any issues, but the Halide PyTorch CUDA operator fails. Additionally, I observed that the CUDA operator produces different numerical tensors each time the test is run.

Here are the error messages I received:

Testing Halide PyTorch CPU operator...
.Double-precision mode, backward_op: add_grad
Test ran successfully: difference is 0.0
.Double-precision mode, backward_op: add_halidegrad
Test ran successfully: difference is 0.0
.Testing Halide PyTorch CPU operator...
.Single-precision mode, backward_op: add_grad
Test ran successfully: difference is 0.0
.Single-precision mode, backward_op: add_halidegrad
Test ran successfully: difference is 0.0
.Testing Halide PyTorch CUDA operator...
.Double-precision mode, backward_op: add_grad
ETesting Halide PyTorch CUDA operator...
.Single-precision mode, backward_op: add_grad
E
======================================================================
ERROR: test_gpu_double (__main__.TestAdd)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test.py", line 32, in test_gpu_double
self._test_add(is_cuda=True, is_double=True)
File "test.py", line 72, in _test_add
res = th.autograd.gradcheck(add, [self.a, self.b], eps=1e-2)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1418, in gradcheck
return _gradcheck_helper(**args)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1435, in _gradcheck_helper
check_undefined_grad=check_undefined_grad)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1076, in _gradcheck_real_imag
rtol, atol, check_grad_dtypes, nondet_tol)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1131, in _slow_gradcheck
raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[ 1.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000,
0.0000],
[ 0.0000, 1.0000, 0.0000, ..., 0.0000, 0.0000,
0.0000],
[ 0.0000, 0.0000, 1.0000, ..., 0.0000, 0.0000,
0.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 1.0000, 0.0000,
0.0000],
[-149.5000, 0.0000, 0.0000, ..., 0.0000, 0.0000,
0.0000],
[ 150.5000, 200.0000, 200.0000, ..., 200.0000, 200.0000,
200.5000]], device='cuda:0', dtype=torch.float64)
analytical:tensor([[1., 0., 0., ..., 0., 0., 0.],
[0., 1., 0., ..., 0., 0., 0.],
[0., 0., 1., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 1., 0., 0.],
[0., 0., 0., ..., 0., 1., 0.],
[0., 0., 0., ..., 0., 0., 1.]], device='cuda:0', dtype=torch.float64)


======================================================================
ERROR: test_gpu_single (__main__.TestAdd)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test.py", line 27, in test_gpu_single
self._test_add(is_cuda=True, is_double=False)
File "test.py", line 72, in _test_add
res = th.autograd.gradcheck(add, [self.a, self.b], eps=1e-2)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1418, in gradcheck
return _gradcheck_helper(**args)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1435, in _gradcheck_helper
check_undefined_grad=check_undefined_grad)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1076, in _gradcheck_real_imag
rtol, atol, check_grad_dtypes, nondet_tol)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1131, in _slow_gradcheck
raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[1.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 1.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 1.0000, ..., 0.0000, 0.0000, 0.0000],
...,
[0.0000, 0.0000, 0.0000, ..., 1.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 1.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 1.0000]],
device='cuda:0')
analytical:tensor([[1., 0., 0., ..., 0., 0., 0.],
[0., 1., 0., ..., 0., 0., 0.],
[0., 0., 1., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 1., 0., 0.],
[0., 0., 0., ..., 0., 1., 0.],
[0., 0., 0., ..., 0., 0., 1.]], device='cuda:0')


----------------------------------------------------------------------
Ran 4 tests in 2.377s

FAILED (errors=2)

By the way, I was able to successfully build and run another Halide generator (e.g., apps/hist), which makes me suspect that the issue might be related to how Halide and PyTorch communicate with CUDA.

Could you help investigate this issue?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant