cuda

Running CUDA

Setup

Make sure you can connect to the discovery cluster by using an ssh client:

ssh -X <usrid>@discovery.neu.edu

If error messages persist you can connect directly to one of the login nodes:

ssh -X <usrid>@discovery2.neu.edu or
ssh -X <usrid>@discovery4.neu.edu

Make sure the following modules are loaded in your bashrc file:

vim ~/.bashrc

Edit to add these lines at the end of the file:

module load slurm-14.11.8 
module load gnu-4.8.1-compilers 
module load fftw-3.3.3 
module load platform-mpi 
module load cuda-7.0

Notice that we are loading the module cuda-7.0. This is the one we will use to be able to execute and compile cuda code. Note: you will need to logout and login for these changes to take effect. Make sure that openmpi is not loaded in your .bashrc file. You can add the following lines as well to make it easier to use slurm:

alias sq="squeue -u $USER"
alias sc="scancel"
alias sb="sbatch"
alias sa="salloc -N 1 --exclusive"

Requesting a compute node for compilation and/or interactive CUDA runs. Check systems for availability by using the sinfo command: sinfo or sinfo | grep idle | grep gpu # To identify gpu compute nodes that are available

Allocate an interactive node by using the following command. Only use the gpu partition for CUDA runs:

salloc -N 1 --exclusive -p <partition name>

And then use squeue to identify which compute node was allocated, and ssh into the compute node.

squeue $USER
ssh -X compute-<#>

If you follow these steps correctly, you are now connected to a compute node with a GPU (K20m). To make sure this worked, you can use the nvidia-smi command to print general GPU information.

nvidia-smi

If you are running your code in batch mode, make sure you have the following lines in your .bash file:

#SBATCH --partition=gpu
#SBATCH --gres=gpu:1

Compiling the code

The example code can be found in the directory

/scratch/gutierrez.jul/HPC/CUDA/

Please copy this folder into your home directory.

cp -r /scratch/gutierrez.jul/HPC/CUDA ~/

Go inside this folder and look at the files that are there. Understand what the Makefile and cu files do before moving forward. As you can see, this code is doing a vector add on the GPU.

cd ~/CUDA

Compile the code using the makefile by executing the following command:

make all

In this example we use nvcc to compile the .cu file.

Running CUDA binary

We can run the program by just executing the executable file that is generated from the compilation process:

./vecAdd

An option to run this same command has been added to the Makefile. This can be launched by running:

make run

Other Tools

Nvidia offer many tools to help profile or check your code for correctness. These are just two examples that can get you started to debug your code but I encourage you to look into other tools as well, including nvvp.

CUDA-memcheck

CUDA-memcheck is a very useful tool to check if your code is working correctly. It looks for illegal accesses in memory (accessing an illegal pointer, or when you accessed an array beyond its size, etc). To run the tool, you do the following:

cuda-memcheck <command used to execute the code>

When you run it with your code, are there any errors? If so, fix them.

NVPROF

NVprof is a great tool for profiling your application and understanding what is happening in the execution and how to improve it. It displays information such as: how long it takes to copy data from and to the device, kernel execution time, and a lot more. You can also use it to display specific metrics on the performance for instance, efficiency of the memory caches or bandwidth used from the memory, etc. This is out of the scope for this manual but you are more than welcome to test it out. Run NVPROF using this command:

nvprof <command used to execute the code>

This will print out information such as the kernel execution time, execution time for different CUDA API calls and the execution time for the memory copy instructions (Host to device and device to host). Now that you have tried it out, implement your own code using this guideline and hopefully everything should work fine. Feel free to modify the Makefile and the code to suite your needs.

Running CUDA and OpenCV Together

We will use an example code that uses the opencv API to read images and then we use cuda to modify the contents of this image. Follow the same steps in this guide up to compiling the code.

Compiling the code

The example code can be found in the directory

/scratch/gutierrez.jul/HPC/transpose/

Please copy this folder into your home directory.

cp -r /scratch/gutierrez.jul/HPC/transpose ~/

Go inside this folder and look at the files that are there. Understand what the Makefile and files under the src directory do before moving forward. As you can see, this code is doing an image transpose on the GPU. This code uses the opencv interface to read and write images. In order to use this API, we need to load it using the module command:

module load opencv

Make sure the previous command is executed when you have ssh’ed into a compute node from the gpu partitions. CD into the folder with the files.

cd ~/transpose

Compile the code using the makefile by executing the following command:

make all

In this example we use nvcc to compile the .cu file.

Running CUDA binary

We can run the program by just executing the executable file that is generated from the compilation process:

./trans input/lake.jpg output.jpg

Visualizing the Output

Visualizing the output on the discovery cluster can be cumbersome (you can do so with -X forwarding through ssh). But in general, I recommend copying the output file into your own computer. You can do so by using scp or FileZilla. Next is an example of scp to copy the output file from the discovery cluster to your own computer:

scp <USER>@discovery2.neu.edu:~/transpose/output.jpg .

Back to main page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda

Running CUDA

Setup

Compiling the code

Running CUDA binary

Other Tools

CUDA-memcheck

NVPROF

Running CUDA and OpenCV Together

Compiling the code

Running CUDA binary

Visualizing the Output

Main Topics

Special Topics

Clone this wiki locally