-
Notifications
You must be signed in to change notification settings - Fork 9
cuda
Make sure you can connect to the discovery cluster by using an ssh client:
ssh -X <usrid>@discovery.neu.edu
If error messages persist you can connect directly to one of the login nodes:
ssh -X <usrid>@discovery2.neu.edu or
ssh -X <usrid>@discovery4.neu.edu
Make sure the following modules are loaded in your bashrc file:
vim ~/.bashrc
Edit to add these lines at the end of the file:
module load slurm-14.11.8
module load gnu-4.8.1-compilers
module load fftw-3.3.3
module load platform-mpi
module load cuda-7.0
Notice that we are loading the module cuda-7.0. This is the one we will use to be able to execute and compile cuda code. Note: you will need to logout and login for these changes to take effect. Make sure that openmpi is not loaded in your .bashrc file. You can add the following lines as well to make it easier to use slurm:
alias sq="squeue -u $USER"
alias sc="scancel"
alias sb="sbatch"
alias sa="salloc -N 1 --exclusive"
Requesting a compute node for compilation and/or interactive CUDA runs.
Check systems for availability by using the sinfo command:
sinfo
or
sinfo | grep idle | grep gpu
# To identify gpu compute nodes that are available
Allocate an interactive node by using the following command. Only use the gpu partition for CUDA runs:
salloc -N 1 --exclusive -p <partition name>
And then use squeue to identify which compute node was allocated, and ssh into the compute node.
squeue $USER
ssh -X compute-<#>
If you follow these steps correctly, you are now connected to a compute node with a GPU (K20m). To make sure this worked, you can use the nvidia-smi command to print general GPU information.
nvidia-smi
If you are running your code in batch mode, make sure you have the following lines in your .bash file:
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
The example code can be found in the directory
/scratch/gutierrez.jul/HPC/CUDA/
Please copy this folder into your home directory.
cp -r /scratch/gutierrez.jul/HPC/CUDA ~/
Go inside this folder and look at the files that are there. Understand what the Makefile and cu files do before moving forward. As you can see, this code is doing a vector add on the GPU.
cd ~/CUDA
Compile the code using the makefile by executing the following command:
make all
In this example we use nvcc to compile the .cu file.
We can run the program by just executing the executable file that is generated from the compilation process:
./vecAdd
An option to run this same command has been added to the Makefile. This can be launched by running:
make run
Nvidia offer many tools to help profile or check your code for correctness. These are just two examples that can get you started to debug your code but I encourage you to look into other tools as well, including nvvp.
CUDA-memcheck is a very useful tool to check if your code is working correctly. It looks for illegal accesses in memory (accessing an illegal pointer, or when you accessed an array beyond its size, etc). To run the tool, you do the following:
cuda-memcheck <command used to execute the code>
When you run it with your code, are there any errors? If so, fix them.
NVprof is a great tool for profiling your application and understanding what is happening in the execution and how to improve it. It displays information such as: how long it takes to copy data from and to the device, kernel execution time, and a lot more. You can also use it to display specific metrics on the performance for instance, efficiency of the memory caches or bandwidth used from the memory, etc. This is out of the scope for this manual but you are more than welcome to test it out. Run NVPROF using this command:
nvprof <command used to execute the code>
This will print out information such as the kernel execution time, execution time for different CUDA API calls and the execution time for the memory copy instructions (Host to device and device to host). Now that you have tried it out, implement your own code using this guideline and hopefully everything should work fine. Feel free to modify the Makefile and the code to suite your needs.
We will use an example code that uses the opencv API to read images and then we use cuda to modify the contents of this image. Follow the same steps in this guide up to compiling the code.
The example code can be found in the directory
/scratch/gutierrez.jul/HPC/transpose/
Please copy this folder into your home directory.
cp -r /scratch/gutierrez.jul/HPC/transpose ~/
Go inside this folder and look at the files that are there. Understand what the Makefile and files under the src directory do before moving forward. As you can see, this code is doing an image transpose on the GPU. This code uses the opencv interface to read and write images. In order to use this API, we need to load it using the module command:
module load opencv
Make sure the previous command is executed when you have ssh’ed into a compute node from the gpu partitions. CD into the folder with the files.
cd ~/transpose
Compile the code using the makefile by executing the following command:
make all
In this example we use nvcc to compile the .cu file.
We can run the program by just executing the executable file that is generated from the compilation process:
./trans input/lake.jpg output.jpg
Visualizing the output on the discovery cluster can be cumbersome (you can do so with -X forwarding through ssh). But in general, I recommend copying the output file into your own computer. You can do so by using scp or FileZilla. Next is an example of scp to copy the output file from the discovery cluster to your own computer:
scp <USER>@discovery2.neu.edu:~/transpose/output.jpg .
Back to main page.