Vitis AI Tutorials

PyTorch flow for Vitis AI

Current Status

Tested on ZCU102, Alveo U50
Tools used: PyTorch 1.4 & Vitis AI 1.4
Dataset: MNIST handwritten digits
Network: Custom CNN

Introduction

This tutorial introduces the user to the Vitis AI TensorFlow design process and will illustrate how to go from a python description of the network model to running a compiled model on a Xilinx evaluation board.

The application code in this example design is written in Python and uses the VART runtime.

We will run the following steps:

Training and evaluation of a small custom convolutional neural network using PyTorch 1.4
Quantization and evaluation of the floating-point model.
Compilation of the quantized model to create the .xmodel files ready for execution on the DPU accelerator IP.
Download and run the application on the ZCU102 and Alveo U50 evaluation boards.

This tutorial assumes the user is familiar with Python3, PyTorch and has some knowledge of machine learning principles.

The MNIST dataset

The MNIST handwritten digits dataset is a publicly available dataset that contains a total of 70k 8bit grayscale images each of which are 28pixels x 28pixels. The complete dataset of 70k images is normally divided into 60k images for training and 10k images for validation. The dataset is considered to be the 'hello world' of machine learning and makes a simple introduction to learn the complete Xilinx Vitis-AI flow.

The Convolutional Neural Network

The convolutional neural network in this design has deliberately been kept as simple as possible and consists of just four layers of 2D convolution interspersed with batch normalization and ReLU activation. The network is described in the common.py python script.

Before You Begin

The host machine has several requirements that need to be met before we begin. You will need:

An x86 host machine with a supported OS and either the CPU or GPU versions of the Vitis-AI docker installed - see System Requirements.
The host machine will require Docker to be installed and the Vitis-AI CPU or GPU docker image to be built - see Getting Started.
A GPU card suitable for training is recommended, but the training in this tutorial is quite simple and a CPU can be used.
If you plan to use the ZCU102 evaluation board, it should be prepared with the board image as per the Step2: Setup the Target instructions. Hints on how to connect the various cables to the ZCU102 are also available here.
For the Alveo U50, follow the Setup Alveo Accelerator Card instructions.

For more details, refer to the latest version of the Vitis AI User Guide (UG1414).

Setting up the workspace

Copy the repository by doing either of the following:
- Download the repository as a ZIP file to the host machine, and then unzip the archive.
- From a terminal, use the git clone command.
Open a linux terminal, cd to the repository folder, and then cd to the files folder.

Start the Vitis AI GPU docker:

# navigate to tutorial folder
cd <path_to_densenet_design>/files

# to start GPU docker container
./docker_run.sh xilinx/vitis-ai-gpu:latest

The docker container will start and after accepting the license agreement, you should see something like this in the terminal:

 ```shell
 ==========================================
 
 __      ___ _   _                   _____
 \ \    / (_) | (_)            /\   |_   _|
  \ \  / / _| |_ _ ___ ______ /  \    | |
   \ \/ / | | __| / __|______/ /\ \   | |
    \  /  | | |_| \__ \     / ____ \ _| |_
     \/   |_|\__|_|___/    /_/    \_\_____|
 
 ==========================================
 
 Docker Image Version:  latest 
 Build Date: 2021-08-04
 VAI_ROOT: /opt/vitis_ai
 
 For TensorFlow 1.15 Workflows do:
      conda activate vitis-ai-tensorflow 
 For Caffe Workflows do:
      conda activate vitis-ai-caffe 
 For Neptune Workflows do:
      conda activate vitis-ai-neptune 
 For PyTorch Workflows do:
      conda activate vitis-ai-pytorch 
 For TensorFlow 2.3 Workflows do:
      conda activate vitis-ai-tensorflow2 
 For Darknet Optimizer Workflows do:
      conda activate vitis-ai-optimizer_darknet 
 For TensorFlow 1.15 Optimizer Workflows do:
      conda activate vitis-ai-optimizer_caffe 
 For TensorFlow 1.15 Workflows do:
      conda activate vitis-ai-optimizer_tensorflow 
 Vitis-AI /workspace > 
 ```

💡 If you get a "Permission Denied" error when starting the docker container, it is almost certainly because the docker_run.sh script is not set to be executable. You can fix this by running the following command:
chmod +x docker_run.sh

Activate the PyTorch python virtual environment with conda activate vitis-ai-pytorch and you should see the prompt change to indicate that the environment is active:

Vitis-AI /workspace > conda activate vitis-ai-pytorch
(vitis-ai-pytorch) Vitis-AI /workspace >

The remainder of this README describes each single step to implement the tutorial, however a shell script called run_all.sh is provided which will run the complete flow:

(vitis-ai-pytorch) Vitis-AI /workspace > source run_all.sh

Step 0 - Training

To run step 0:

(vitis-ai-pytorch) Vitis-AI /workspace > export BUILD=./build
(vitis-ai-pytorch) Vitis-AI /workspace > export LOG=${BUILD}/logs
(vitis-ai-pytorch) Vitis-AI /workspace > mkdir -p ${LOG}
(vitis-ai-pytorch) Vitis-AI /workspace > python -u train.py -d ${BUILD} 2>&1 | tee ${LOG}/train.log

The train.py script will execute the training of the CNN and save the trained floating-point model as a .pth file called f_model.pth into the ./build/float_model folder.

The script will first check how many GPUs are available and by default it will select GPU #0 as this is usually the fastest. If you wish to select a different GPU, then modify the following line:

device = torch.device('cuda:0')

If no GPU is available, the CPU will be selected as the execution unit.

The complete list of command line arguments of train.py are as follows:

Argument	Default	Description
--build_dir	'build'	build folder
--batchsize	100	Batchsize used in training and validation - adjust for memory capacity of your GPU(s)
--epochs	3	Number of training epochs
--learnrate	0.001	Initial learning rate for optimizer

Step 1 - Quantization

To run step 1:

(vitis-ai-pytorch) Vitis-AI /workspace > python -u quantize.py -d ${BUILD} --quant_mode calib 2>&1 | tee ${LOG}/quant_calib.log
(vitis-ai-pytorch) Vitis-AI /workspace > python -u quantize.py -d ${BUILD} --quant_mode test  2>&1 | tee ${LOG}/quant_test.log

The Xilinx DPU family of ML accelerators execute models and networks that have their parameters in integer format so we must convert the trained, floating-point checkpoint into a fixed-point integer checkpoint - this process is known as quantization.

Once quantization is finished, the quantized model can be found in the ./build/quant_model folder.

The complete list of command line arguments of quantize.py are as follows:

Argument	Default	Description
--build_dir	'build'	build folder
--quant_mode	'calib'	Quantization script mode: 'calib' - quantize, 'test - evaluate quantized model
--batchsize	100	Batchsize used in evaluation - adjust for memory capacity of your GPU(s)

Step 2 - Compiling for the target

To run step 2, execute the compile.sh shell script with one of the target boards as a command line argument, for example:

(vitis-ai-pytorch) Vitis-AI /workspace > source compile.sh zcu102 ${BUILD} ${LOG}

The compile.sh shell script will compile the quantized model and create an .xmodel file which contains the instructions and data to be executed by the DPU. The script also supports zcu104, vck190 and u50 as command line arguments to target the Zynq ZCU104, Versal VCK190 and Alveo U50. The compiled xmodel will be written to the ./build/compiled_model folder and named CNN_<board_name>.xmodel.

Step 3 - Running the application on the target

To prepare the images, xmodel and application code for copying to the selected target, run the following command:

(vitis-ai-pytorch) Vitis-AI /workspace > python -u target.py --target zcu102 -d ${BUILD} 2>&1 | tee ${LOG}/target_zcu102.log

The script also supports zcu104, vck190 and u50 as possible values for the --target command line option.

The target.py script will do the following:

Make a folder named ./build/target_<board_name>.
Copy the appropriate compiled model to the ./build/target_<board_name> folder.
Copy the Python application code to the ./build/target_<board_name> folder.
Convert the MNIST test dataset to PNG image files.
- the number of images is set by the --num_images command line argument which defaults to 10000.

The complete list of command line arguments of target.py are as follows:

Argument	Default	Description
--build_dir	'build'	build folder
--target	'zcu102'	Name of target board - zcu102, zcu104, vck190, u50
--num_images	10000	Number of MNIST test samples to convert to PNG images
--app_dir	'application'	Folder containing Python application code (app_mt.py)

ZCU102

The entire target_zcu102 folder will be copied to the ZCU102. Copy it to the /home/root folder of the flashed SD card, this can be done in one of several ways:

Direct copy to SD Card:

If the host machine has an SD card slot, insert the flashed SD card and when it is recognised you will see two volumes, BOOT and ROOTFS. Navigate into the ROOTFS and then into the /home folder. Make the ./root folder writeable by issuing the command sudo chmod -R 777 root and then copy the entire target_zcu102 folder from the host machine into the /home/root folder of the SD card.
Unmount both the BOOT and ROOTFS volumes from the host machine and then eject the SD Card from the host machine.

With scp command:

If the target evaluation board is connected to the same network as the host machine, the target_zcu102 folder can be copied using scp.
The command will be something like scp -r ./build/target_zcu102 root@192.168.1.227:~/. assuming that the target board IP address is 192.168.1.227 - adjust this as appropriate for your system.
If the password is asked for, insert 'root'.

With the target_zcu102 folder copied to the SD Card and the evaluation board booted, you can issue the command for launching the application - note that this done on the target evaluation board, not the host machine, so it requires a connection to the board such as a serial connection to the UART or an SSH connection via Ethernet.

The application can be started by navigating into the target_zcu102 folder on the evaluation board and then issuing the command python3 app_mt.py -m CNN_zcu102.xmodel. The application will start and after a few seconds will show the throughput in frames/sec, like this:

root@xilinx-zcu102-2021_1:~/target_zcu102# python3 app_mt.py -m CNN_zcu102.xmodel
Command line options:
 --image_dir :  images
 --threads   :  1
 --model     :  CNN_zcu102.xmodel
-------------------------------
Pre-processing 10000 images...
-------------------------------
Starting 1 threads...
-------------------------------
Throughput=3748.10 fps, total frames = 10000, time=2.6680 seconds
Correct:9877, Wrong:123, Accuracy:0.9877
-------------------------------

The performance can be increased by increasing the number of threads with the --threads argument:

root@xilinx-zcu102-2021_1:~/target_zcu102# python3 app_mt.py -m CNN_zcu102.xmodel --threads 4
Command line options:
 --image_dir :  images
 --threads   :  4
 --model     :  CNN_zcu102.xmodel
-------------------------------
Pre-processing 10000 images...
-------------------------------
Starting 4 threads...
-------------------------------
Throughput=6113.22 fps, total frames = 10000, time=1.6358 seconds
Correct:9877, Wrong:123, Accuracy:0.9877
-------------------------------

Alveo U50

Note that the U50 will need to have been flashed with the correct deployment shell - this should have been done in the 'Preparing the host machine and target boards' section above.

The following steps should be run from inside the Vitis-AI Docker container:

Ensure that Vitis-AI's PyTorch conda environment is enabled (if not, the run conda activate vitis-ai-pytorch).
Run source setup.sh DPUCAHX8H which sets environment variables to point to the correct overlay for the U50. The complete steps to run are as follows:

conda activate vitis-ai-pytorch
source setup.sh DPUCAHX8H
cd build/target_u50
/usr/bin/python3 app_mt.py -m CNN_u50.xmodel

The console output will be like this:

(vitis-ai-pytorch) Vitis-AI /workspace/build/target_u50 > /usr/bin/python3 app_mt.py -m CNN_u50.xmodel
Command line options:
 --image_dir :  images
 --threads   :  1
 --model     :  CNN_u50.xmodel
-------------------------------
Pre-processing 10000 images...
-------------------------------
Starting 1 threads...
-------------------------------
Throughput=14362.58 fps, total frames = 10000, time=0.6963 seconds
Correct:9877, Wrong:123, Accuracy:0.9877
-------------------------------

Perfromance can be slightly increased by increasing the number fo threads:

(vitis-ai-pytorch) Vitis-AI /workspace/build/target_u50 > /usr/bin/python3 app_mt.py -m CNN_u50.xmodel -t 6
Command line options:
 --image_dir :  images
 --threads   :  6
 --model     :  CNN_u50.xmodel
-------------------------------
Pre-processing 10000 images...
-------------------------------
Starting 6 threads...
-------------------------------
Throughput=16602.34 fps, total frames = 10000, time=0.6023 seconds
Correct:9877, Wrong:123, Accuracy:0.9877
-------------------------------

References

Vitis AI Optimizer User Guide (UG1333)
Vitis AI User Guide (UG1414)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Vitis AI Tutorials

PyTorch flow for Vitis AI

Current Status

Introduction

The MNIST dataset

The Convolutional Neural Network

Before You Begin

Setting up the workspace

Step 0 - Training

Step 1 - Quantization

Step 2 - Compiling for the target

Step 3 - Running the application on the target

ZCU102

Alveo U50

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Vitis AI Tutorials

PyTorch flow for Vitis AI

Current Status

Introduction

The MNIST dataset

The Convolutional Neural Network

Before You Begin

Setting up the workspace

Step 0 - Training

Step 1 - Quantization

Step 2 - Compiling for the target

Step 3 - Running the application on the target

ZCU102

Alveo U50

References