- Tested with Vitis-AI 1.4, TensorFlow 2.3 and ZCU102 evaluation board
- Dataset: ImageNet ILSVRC2012
- Network: MobileNet
Machine learning models such as Convolutional Neural Networks are generally trained and saved using floating-point formats. However the Xilinx family of DPU accelerators operate on fixed point integer values (usually 8bit) and so this requires our original floating-point model to be quantized. The process of converting a model whose values have a very wide dynamic range to a functionally equivalent model but with values that have a much more limited dynamic range can lead to a small loss in accuracy.
Some models are affected more than others by this change in dynamic range and when this happens it may be necessary to use quantization fine-tuning to recover accuracy.
This tutorial will show you how to implement the Vitis-AI quantization fine-tuning for TensorFlow2.3. The steps we will run are as follows:
- Download the Imagenet training and validation datasets and convert them to TFRecord format.
- Train, evaluate and save the floating-point MobileNet model.
- Quantization of the trained model without fine-tuning to demonstrate accuracy loss.
- Quantization with fine-tuning to recover the lost accuracy and bring the quantized model back up to the same accuracy as the floating-point model.
- Compiling the fine-tuned model for the target board
- Execution of the compiled model on the target board
This tutorial assumes the user is familiar with Python3, TensorFlow and has some knowledge of machine learning principles.
The complete flow is shown in the figures below:
We will use a MobileNet that has been pre-trained on ImageNet as our starting point. The Keras MobileNet model is instantiated as a base model but without including the final classification layers. We then add a new classification head that consists of a dropout layer, a 10-output dense/fully-connected layer and a softmax layer.
The MobileNet base model is set to be 'frozen' and will not be trained at all, only the new 10-output classification head will be trained.
The complete code for the MobileNet model is contained in the xfer_model.py file. The base model is instantiated with the ImageNet weights but no classifier head layers:
base_model = MobileNet(input_shape=input_shape,
alpha=1.0,
depth_multiplier=1,
include_top=False,
weights='imagenet',
pooling='avg')
..and then set to be non-trainable:
base_model.trainable = False
The output of the base model is then fed into a new classifier head which is parameterized for the number of classes we are using:
net = base_model.output
# add output classifier head
net = Dropout(0.01)(net)
output_layer = Dense(classes)(net)
The host machine has several requirements that need to be met before we begin. You will need:
-
An x86 host machine with a supported OS and either the CPU or GPU versions of the Vitis-AI docker installed - see System Requirements.
-
The host machine will require Docker to be installed and the Vitis-AI CPU or GPU docker image to be built - see Getting Started.
-
A GPU card suitable for training is recommended, but the training in this tutorial is quite simple and a CPU can be used.
-
If you plan to use the ZCU102 evaluation board, it should be prepared with the board image as per the Step2: Setup the Target instructions. Hints on how to connect the various cables to the ZCU102 are also available here.
For more details, refer to the latest version of the Vitis AI User Guide (UG1414).
This tutorial assumes the user is familiar with Python3, TensorFlow and has some knowledge of machine learning principles.
-
Copy this repository by doing either of the following:
- Download the repository as a ZIP file to the host machine, and then unzip the archive.
- From a terminal, use the
git clone
command.
-
Open a linux terminal,
cd
to the repository folder, and thencd
to thefiles
folder. -
Start either the Vitis AI GPU or CPU docker (we recommend using the GPU docker if possible):
# navigate to tutorial folder cd <path_to_tutorial>/files # to start GPU docker container ./docker_run.sh xilinx/vitis-ai-gpu:latest # to start CPU docker container ./docker_run.sh xilinx/vitis-ai-cpu:latest
The docker container will start and after accepting the license agreement, you should see something like this in the terminal:
```shell
==========================================
__ ___ _ _ _____
\ \ / (_) | (_) /\ |_ _|
\ \ / / _| |_ _ ___ ______ / \ | |
\ \/ / | | __| / __|______/ /\ \ | |
\ / | | |_| \__ \ / ____ \ _| |_
\/ |_|\__|_|___/ /_/ \_\_____|
==========================================
Docker Image Version: 1.4.776
Build Date: 2021-06-22
VAI_ROOT: /opt/vitis_ai
For TensorFlow 1.15 Workflows do:
conda activate vitis-ai-tensorflow
For Caffe Workflows do:
conda activate vitis-ai-caffe
For Neptune Workflows do:
conda activate vitis-ai-neptune
For PyTorch Workflows do:
conda activate vitis-ai-pytorch
For TensorFlow 2.3 Workflows do:
conda activate vitis-ai-tensorflow2
For Darknet Optimizer Workflows do:
conda activate vitis-ai-optimizer_darknet
For Caffe Optimizer Workflows do:
conda activate vitis-ai-optimizer_caffe
For TensorFlow 1.15 Workflows do:
conda activate vitis-ai-optimizer_tensorflow
For LSTM Workflows do:
conda activate vitis-ai-lstm
Vitis-AI /workspace >
```
💡 If you get a "Permission Denied" error when starting the docker container, it is almost certainly because the docker_run.sh script is not set to be executable. You can fix this by running the following command:
chmod +x docker_run.sh
Activate the Tensorflow2 python virtual environment with conda activate vitis-ai-tensorflow2
and you should see the prompt change to indicate that the environment is active:
Vitis-AI /workspace > conda activate vitis-ai-tensorflow2
(vitis-ai-tensorflow2) Vitis-AI /workspace >
The remainder of this README describes each single step to implement the tutorial - each command needs to be run from within the Vitis-AI Docker container which was started in the previous section.
A shell script called run_all.sh is also provided - this contains the commands needed to run the complete flow:
source run_all.sh
-
Download ILSVRC2012_img_val.tar from Academic Torrents or other source. Place ILSVRC2012_img_val.tar in the <path_to_repo>/files folder.
-
Download ILSVRC2012_img_train.tar from Academic Torrents or other source. Place ILSVRC2012_img_train.tar in the <path_to_repo>/files folder.
-
Convert the images in the ILSVRC2012_img_val.tar archive to TFRecords (requires an internet connection). The
-mc 10
option will ensure that only the first 10 classes are used instead of all 1000 classes:
python -u make_val_tfrec.py -mc 10 2>&1 | tee make_val_tfrec.log
..this should create a folder called tfrec_val that contains a single TFRecord file - if you use more classes then more than one TFRecord file may be generated. The ILSVRC2012_img_val.tar archive can be deleted at this point to save disk space.
- Convert the images in the ILSVRC2012_img_train.tar archive to TFRecords (requires internet connection). The
-mc 10
option will ensure that only the first 10 classes are used - these are obviously the same 10 classes used when converting the validation set:
python -u make_train_tfrec.py -mc 10 2>&1 | tee make_train_tfrec.log
..this should create a folder called tfrec_train that contains 10 TFRecord files. If required, the ILSVRC2012_img_train.tar archive can be deleted at this point to save disk space.
Train the floating-point MobileNet model. The -mc 10
option will ensure that the classifer output vector is of the correct shape and matches the number of classes used to create the TFRecords:
python -u train.py -mc 10 2>&1 | tee train.log
After training is completed, the trained floating-point model is saved as float_model/float.h5 ...evaluation will also be run and should give an accuracy of approximately 85%.
Next we will run quantization without fine-tuning followed by evaluation. This step is just to demonstrate that normal quantization does not give sufficient accuracy:
python -u quant.py -e 2>&1 | tee quant.log
..evaluation should give an accuracy of approximately 65% which is much lower than the floating-point accuracy.
Now we can run quantization with fine-tuning:
python -u quant_ft.py 2>&1 | tee quant_ft.log
This is effectively a form of retraining of the quantized model. First we load the floating-point model:
float_model = load_model(float_dir+'/float_model.h5', compile=False)
..then we create a quantized model that can be fine-tuned:
# Quantization-aware training model
quantizer = vitis_quantize.VitisQuantizer(float_model)
ft_model = quantizer.get_qat_model()
After that, training is run using the .compile() and .fit() methods of the tf.keras API.
Once the quantization-aware training has completed, evaluation will be run and should give an accuracy of approximately 85% which should be almost identical to the original floating-point model accuracy.
To run step 3, run the compile.sh
with one of the target boards as a command line argument, for example:
source compile.sh zcu102
The compile.sh
script also supports zcu104
and vck190
as command line arguments. The compile.sh
shell script will compile the quantized model and create an .xmodel file which contains the instructions and data to be executed by the DPU in the compiled_model_ folder.
Run make_target.py with the appropriate command lines options to create a folder ready to be copied to the target board, for example:
python -u make_target.py -m compiled_model_zcu102/mobilenet.xmodel -td target_zcu102 2>&1 | tee logs/target_zcu102.log
The entire target_<xxx>
folder needs to be copied to the /home/root folder of the flashed SD card, this can be done in one of several ways:
- Direct copy to SD Card:
-
If the host machine has an SD card slot, insert the flashed SD card and when it is recognised you will see two volumes, BOOT and ROOTFS. Navigate into the ROOTFS and then into the /home folder. Make the ./root folder writeable by issuing the command
sudo chmod -R 777 root
and then copy the entiretarget
folder from the host machine into the /home/root folder of the SD card. -
Unmount both the BOOT and ROOTFS volumes from the host machine and then eject the SD Card from the host machine.
- With scp command:
-
If the target evaluation board is connected to the same network as the host machine, the
target_<xxx>
folder can be copied using scp. -
The command will be something like
scp -r ./build/target_zcu102 [email protected]:~/.
assuming that the target board IP address is 192.168.1.227 - adjust this as appropriate for your system. -
If the password is asked for, insert 'root'.
With the target_<xxx>
folder copied to the SD Card and the evaluation board booted, you can issue the command for launching the application - note that this done on the target evaluation board, not the host machine, so it requires a connection to the board such as a serial connection to the UART or an SSH connection via Ethernet.
The application can be started by navigating into the target_<xxx>
folder on the evaluation board and then issuing the command python3 app_mt.py
. The application will start and after a few seconds will show the accuracy and throughput in frames/sec, like this:
root@xilinx-zcu102-2020_2:~/target# python3 app_mt.py
-----------------------------------------------
Command line options:
--image_dir : images
--threads : 1
--model : mobilenet.xmodel
-----------------------------------------------
Found 500 images - processing 500 of them
Found 1 subgraphs in mobilenet.xmodel
Pre-processing 500 images...
Starting 1 threads...
-----------------------------------------------
Correct:416, Wrong:84, Accuracy:0.8320
-----------------------------------------------
FPS: 194.12, total frames: 500, total time: 2.576 seconds
-----------------------------------------------
The throughput can be improved by increasing the number of threads with the --threads
option:
root@xilinx-zcu102-2020_2:~/target# python3 app_mt.py --threads 6
-----------------------------------------------
Command line options:
--image_dir : images
--threads : 6
--model : mobilenet.xmodel
-----------------------------------------------
Found 500 images - processing 500 of them
Found 1 subgraphs in mobilenet.xmodel
Pre-processing 500 images...
Starting 6 threads...
-----------------------------------------------
Correct:416, Wrong:84, Accuracy:0.8320
-----------------------------------------------
FPS: 731.21, total frames: 500, total time: 0.684 seconds
-----------------------------------------------