Skip to content

Latest commit

 

History

History
60 lines (45 loc) · 3.14 KB

pytorch.md

File metadata and controls

60 lines (45 loc) · 3.14 KB

PyTorch

General remarks

This is a quick guide to run PyTorch with ROCm support inside a provided docker image. Assumes a .deb based system. See ROCm install for supported operating systems and general information on the ROCm software stack.

A ROCm install version 2.1 is required currently.

A Vega10 / gfx900 generation discrete graphics card is required (Vega56, Vega64, or MI25).

The image contains hipified PyTorch source, a clone of the PyTorch examples, and has PyTorch for gfx900 installed.

  1. Install or update rocm-dev on the host system:
    sudo apt-get install rocm-dev
    or
    sudo apt-get update
    sudo apt-get upgrade

  2. Obtain docker image:
    docker pull rocm/pytorch:rocm2.1_ubuntu16.04_pytorch_gfx900

  3. Start a docker container using the downloaded image:
    sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:rocm2.1_ubuntu16.04_pytorch_gfx900
    Note: This will mount your host home directory on /data in the container.

  4. Confirm working setup:
    cd ~/pytorch PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose
    No tests will fail if the setup is correct and hardware is supported.

  5. Run individual example: MNIST
    cd ~/examples/mnist
    Follow instructions in README.md, in this case:
    pip install -r requirements.txt python main.py

  6. Run individual example: Try ImageNet training
    cd ~/examples/imagenet
    Follow instructions in README.md.

Running PyTorch with Caffe2 backend

  1. Obtain docker image:
    docker pull rocm/pytorch:rocm2.1_caffe2
    This image has Caffe2 installed and works for gfx900 and gfx906 architectures.

  2. Start a docker container using the downloaded image:
    sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:rocm2.1_caffe2
    Note: This will mount your host home directory on /data in the container.

  3. Confirm working setup:
    cd ~ && python -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"

  4. Runing benchmarks:
    Caffe2 benchmarking script supports the following networks *MLP, AlexNet, OverFeat, VGGA, Inception. To run benchmarks for networks MLP, AlexNet, OverFeat, VGGA, Inception run the command from pytorch home directory replacing <name_of_the_network> with one of the networks.
    cd /pytorch
    python caffe2/python/convnet_benchmarks.py --batch_size 64 --model <name_of_the_network> --engine MIOPEN

  5. Running example scripts:
    Please refer to the example scripts in caffe2/python/examples. It currently has resnet50_trainer.py which can run ResNet's, ResNeXt's with various layer, groups, depth configurations and char_rnn.py which uses RNNs to do character level prediction. An example resnet50_trainer command would like this:
    python caffe2/python/examples/resnet50_trainer.py --train_data <path_to_train_data> --test_data <path_to_test_data> --batch_size 32 --epoch_size 32000 --num_epochs 10 --num_gpus 2