FastSortFusedNew Function Hangs on Compute Capability 7.5 (Q6000) but Runs Fine on Compute Capability 8.6 (3090/A4500) #55

xiaoc57 · 2024-08-21T07:10:43Z

I have converted a C++ codebase into a PyTorch extension, and it runs perfectly on GPUs with Compute Capability 8.6, specifically on the RTX 3090 and A4500. However, when testing on a Quadro RTX 6000 with Compute Capability 7.5, the FastSortFusedNew function hangs. The function either stalls upon first entry or hangs immediately.

Details:

PyTorch Version: 2.1.0
CUDA Version: 11.8
Operating System (working): Ubuntu 20.04
Operating System (failing): Ubuntu 18.04 or 22.04
I suspect the issue might not be related to the OS version since I encountered the same problem on both Ubuntu 18.04 and 22.04. The function runs without issues on the same codebase on GPUs with Compute Capability 8.6.

Has anyone experienced a similar issue, or does anyone have insights into why this might be happening?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FastSortFusedNew Function Hangs on Compute Capability 7.5 (Q6000) but Runs Fine on Compute Capability 8.6 (3090/A4500) #55

FastSortFusedNew Function Hangs on Compute Capability 7.5 (Q6000) but Runs Fine on Compute Capability 8.6 (3090/A4500) #55

xiaoc57 commented Aug 21, 2024

FastSortFusedNew Function Hangs on Compute Capability 7.5 (Q6000) but Runs Fine on Compute Capability 8.6 (3090/A4500) #55

FastSortFusedNew Function Hangs on Compute Capability 7.5 (Q6000) but Runs Fine on Compute Capability 8.6 (3090/A4500) #55

Comments

xiaoc57 commented Aug 21, 2024