nekrs sample test fails with CUDA_ERROR_INVALID_DEVICE #438
Replies: 3 comments 11 replies
-
Does you machine has two GPU? |
Beta Was this translation helpful? Give feedback.
-
I assume you can run single GPU without issue?
|
Beta Was this translation helpful? Give feedback.
-
Finally after active occa mode : CUDA I got the same nvcc error and as you mentioned was able to track down to missing include dir. Let me know what I can do after this step. Should I recompile? nvcc -O3 --fmad=true -lineinfo -arch=sm_52 -fatbin -Xptxas -v -I/program/nekrs-22.0/nekrs/include -I/program/nekrs-22.0/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include -I/program/nekrs-22.0/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed -L/program/nekrs-22.0/nekrs/lib -locca -x cu /enc/udev_DHqBeb/work/shared/examples/turbPipe/.cache/occa/cache/dc511cf8ce7411de/source.cpp -o /enc/udev_DHqBeb/work/shared/examples/turbPipe/.cache/occa/cache/dc511cf8ce7411de/binary |
Beta Was this translation helpful? Give feedback.
-
Running the sample test mpirun -np 2 nekrs --setup turbPipe.par resulted in the following error running on a GPU server.
Initializing device
terminate called after throwing an instance of 'occa::exception'
what():
---[ Error ]--------------------------------------------------------------------
File : /program/nekrs-22.0/nekRS-22.0/3rd_party/occa/src/occa/internal/modes/cuda/device.cpp
Line : 30
Function : device
Message : Device: Creating Device
CUDA Error [ 101 ]: CUDA_ERROR_INVALID_DEVICE
Stack
12 occa::error(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
11 occa::cuda::error(cudaError_enum, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
10 occa::cuda::device::device(occa::json const&)
9 occa::cuda::cudaMode::newDevice(occa::json const&)
8 occa::device::setup(occa::json const&)
7 occa::device::setup(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
6 device_t::device_t(setupAide&, comm_t&)
5 platform_t::platform_t(setupAide&, int, int)
4 nekrs::setup(int, int, int, int, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int)
3 nekrs()
2 /lib64/libc.so.6(__libc_start_main+0xf5)
1 nekrs()
Beta Was this translation helpful? Give feedback.
All reactions