nrs with more than 1 cpu rank #248
tonyzahtila
started this conversation in
General
Replies: 2 comments
-
In NekRS, each MPI ranks binds one GPU. You will need two GPUs if you use "nrsmpi ethier 2". |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thank you, this has solved this particular problem for me.
Tony
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
If I am to run with more than 1 cpu rank, say, "nrsmpi ethier 2" (with 1 GPU requested by the slurm script), I get the following error,
Initializing device
terminate called after throwing an instance of 'occa::exception'
what():
---[ Error ]--------------------------------------------------------------------
File : /data/cephfs/punim0524/nek/nekRS/3rd_party/occa/src/modes/cuda/device.cpp
Line : 30
Function : device
Message : Device: Creating Device
CUDA Error [ 101 ]: CUDA_ERROR_INVALID_DEVICE
Stack
10 occa::error(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
9 occa::cuda::error(cudaError_enum, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
8 occa::cuda::device::device(occa::properties const&)
7 occa::cuda::cudaMode::newDevice(occa::properties const&)
6 occa::device::setup(occa::properties const&)
5 occaDeviceConfig(setupAide&, ompi_communicator_t*)
4 nekrs::setup(ompi_communicator_t*, int, int, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)
3 /home/tzahtila/.local/nekrs/bin/nekrs()
2 /lib64/libc.so.6(__libc_start_main+0xf5)
1 /home/tzahtila/.local/nekrs/bin/nekrs()
Tony
Beta Was this translation helpful? Give feedback.
All reactions