This is a quick guide to run PyTorch with ROCm support inside a provided docker image. Assumes a .deb based system. See ROCm install for supported operating systems and general information on the ROCm software stack.
A ROCm install version 2.1 is required currently.
A Vega10 / gfx900 generation discrete graphics card is required (Vega56, Vega64, or MI25).
The image contains hipified PyTorch source, a clone of the PyTorch examples, and has PyTorch for gfx900 installed.
-
Install or update rocm-dev on the host system:
sudo apt-get install rocm-dev
or
sudo apt-get update
sudo apt-get upgrade
-
Obtain docker image:
docker pull rocm/pytorch:rocm2.1_ubuntu16.04_pytorch_gfx900
-
Start a docker container using the downloaded image:
sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:rocm2.1_ubuntu16.04_pytorch_gfx900
Note: This will mount your host home directory on/data
in the container. -
Confirm working setup:
cd ~/pytorch
PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose
No tests will fail if the setup is correct and hardware is supported. -
Run individual example: MNIST
cd ~/examples/mnist
Follow instructions inREADME.md
, in this case:
pip install -r requirements.txt
python main.py
-
Run individual example: Try ImageNet training
cd ~/examples/imagenet
Follow instructions inREADME.md
.
-
Obtain docker image:
docker pull rocm/pytorch:rocm2.1_caffe2
This image has Caffe2 installed and works for gfx900 and gfx906 architectures. -
Start a docker container using the downloaded image:
sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:rocm2.1_caffe2
Note: This will mount your host home directory on/data
in the container. -
Confirm working setup:
cd ~ && python -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"
-
Runing benchmarks:
Caffe2 benchmarking script supports the following networks *MLP, AlexNet, OverFeat, VGGA, Inception. To run benchmarks for networks MLP, AlexNet, OverFeat, VGGA, Inception run the command from pytorch home directory replacing<name_of_the_network>
with one of the networks.
cd /pytorch
python caffe2/python/convnet_benchmarks.py --batch_size 64 --model <name_of_the_network> --engine MIOPEN
-
Running example scripts:
Please refer to the example scripts incaffe2/python/examples
. It currently hasresnet50_trainer.py
which can run ResNet's, ResNeXt's with various layer, groups, depth configurations andchar_rnn.py
which uses RNNs to do character level prediction. An example resnet50_trainer command would like this:
python caffe2/python/examples/resnet50_trainer.py --train_data <path_to_train_data> --test_data <path_to_test_data> --batch_size 32 --epoch_size 32000 --num_epochs 10 --num_gpus 2