HelloPyTorch CUDA Operator Test Fails on Windows #8549

i-chiang · 2025-01-08T02:31:16Z

Environment:

OS: Windows 11
PyTorch version: 1.13.1+cu117
Python version: 3.7.12
CUDA version: 11.7
GPU models and configuration: NVIDIA GeForce GTX 1070, NVIDIA GeForce RTX 4060
Halide version: Release v19.0.0 (Halide-19.0.0-x86-64-windows)

I built add_generator.cpp to generate the corresponding static library, C header, and PyTorch wrapper using Visual Studio 2022.
I also added flags to setup.py to resolve compilation issues:

compile_args = ["/std:c++17", "/sdl-"]
link_args = ["/FORCE:MULTIPLE"]

When I run test.py, the Halide PyTorch CPU operator passes without any issues, but the Halide PyTorch CUDA operator fails. Additionally, I observed that the CUDA operator produces different numerical tensors each time the test is run.

Here are the error messages I received:

Testing Halide PyTorch CPU operator...
.Double-precision mode, backward_op: add_grad
Test ran successfully: difference is 0.0
.Double-precision mode, backward_op: add_halidegrad
Test ran successfully: difference is 0.0
.Testing Halide PyTorch CPU operator...
.Single-precision mode, backward_op: add_grad
Test ran successfully: difference is 0.0
.Single-precision mode, backward_op: add_halidegrad
Test ran successfully: difference is 0.0
.Testing Halide PyTorch CUDA operator...
.Double-precision mode, backward_op: add_grad
ETesting Halide PyTorch CUDA operator...
.Single-precision mode, backward_op: add_grad
E
======================================================================
ERROR: test_gpu_double (__main__.TestAdd)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test.py", line 32, in test_gpu_double
self._test_add(is_cuda=True, is_double=True)
File "test.py", line 72, in _test_add
res = th.autograd.gradcheck(add, [self.a, self.b], eps=1e-2)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1418, in gradcheck
return _gradcheck_helper(**args)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1435, in _gradcheck_helper
check_undefined_grad=check_undefined_grad)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1076, in _gradcheck_real_imag
rtol, atol, check_grad_dtypes, nondet_tol)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1131, in _slow_gradcheck
raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[ 1.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000,
0.0000],
[ 0.0000, 1.0000, 0.0000, ..., 0.0000, 0.0000,
0.0000],
[ 0.0000, 0.0000, 1.0000, ..., 0.0000, 0.0000,
0.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 1.0000, 0.0000,
0.0000],
[-149.5000, 0.0000, 0.0000, ..., 0.0000, 0.0000,
0.0000],
[ 150.5000, 200.0000, 200.0000, ..., 200.0000, 200.0000,
200.5000]], device='cuda:0', dtype=torch.float64)
analytical:tensor([[1., 0., 0., ..., 0., 0., 0.],
[0., 1., 0., ..., 0., 0., 0.],
[0., 0., 1., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 1., 0., 0.],
[0., 0., 0., ..., 0., 1., 0.],
[0., 0., 0., ..., 0., 0., 1.]], device='cuda:0', dtype=torch.float64)


======================================================================
ERROR: test_gpu_single (__main__.TestAdd)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test.py", line 27, in test_gpu_single
self._test_add(is_cuda=True, is_double=False)
File "test.py", line 72, in _test_add
res = th.autograd.gradcheck(add, [self.a, self.b], eps=1e-2)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1418, in gradcheck
return _gradcheck_helper(**args)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1435, in _gradcheck_helper
check_undefined_grad=check_undefined_grad)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1076, in _gradcheck_real_imag
rtol, atol, check_grad_dtypes, nondet_tol)
File "D:\elizabeth\pytorch-gpu\lib\site-packages\torch\autograd\gradcheck.py", line 1131, in _slow_gradcheck
raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[1.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 1.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 1.0000, ..., 0.0000, 0.0000, 0.0000],
...,
[0.0000, 0.0000, 0.0000, ..., 1.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 1.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 1.0000]],
device='cuda:0')
analytical:tensor([[1., 0., 0., ..., 0., 0., 0.],
[0., 1., 0., ..., 0., 0., 0.],
[0., 0., 1., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 1., 0., 0.],
[0., 0., 0., ..., 0., 1., 0.],
[0., 0., 0., ..., 0., 0., 1.]], device='cuda:0')


----------------------------------------------------------------------
Ran 4 tests in 2.377s

FAILED (errors=2)

By the way, I was able to successfully build and run another Halide generator (e.g., apps/hist), which makes me suspect that the issue might be related to how Halide and PyTorch communicate with CUDA.

Could you help investigate this issue?

Thank you!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HelloPyTorch CUDA Operator Test Fails on Windows #8549

HelloPyTorch CUDA Operator Test Fails on Windows #8549

i-chiang commented Jan 8, 2025

HelloPyTorch CUDA Operator Test Fails on Windows #8549

HelloPyTorch CUDA Operator Test Fails on Windows #8549

Comments

i-chiang commented Jan 8, 2025