VAD: Vectorized Scene Representation for Efficient Autonomous Driving(VAD) is an end-to-end vectorized paradigm for autonomous driving.
In this demo, we will use VAD-Tiny
as our deployment target.
Method | Backbone precision |
Head precision |
Framework | avg. L2 | Latency(ms) |
---|---|---|---|---|---|
VAD-Tiny | fp16 | fp32 | TensorRT | 0.78 | 90.2 (On Orin) |
cd /workspace
git clone https://github.com/hustvl/VAD.git
git clone https://github.com/NVIDIA/DL4AGX.git
Please follow the instructions in the official repo (install.md, prepare-dataset.md) to setup VAD environment first.
Then download VAD_tiny.pth
from google drive to folder /workspace/VAD/ckpts
,
You may verify your installation with
cd /workspace/VAD
CUDA_VISIBLE_DEVICES=0 python tools/test.py /workspace/VAD/projects/configs/VAD/VAD_tiny_stage_2.py /workspace/VAD/ckpts/VAD_tiny.pth --launcher none --eval bbox --tmpdir tmp
This command line is expected to output the benchmark results. This environment will be referred to as torch container
.
NOTE You may need to adjust the original repo to reproduce the correct results
img_norm_cfg = dict( mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
- If you got attribute error in /workspace/VAD/projects/mmdet3d_plugin/core/bbox/structures/lidar_box3d.py
# from mmdet3d.ops.roiaware_pool3d import points_in_boxes_gpu from mmdet3d.ops import points_in_boxes_all as points_in_boxes_gpu
- For best user experience, we recommend use torch >= 1.12. You may also build the docker with given ./dockerfile. To build the docker, here is the example command line. You may change the argument for volume mapping according to your setup.
cd /workspace/DL4AGX/AV-Solutions/vad-trt docker build --network=host -f dockerfile . -t vad-trt docker run --name=vad-trt -d -it --rm --shm-size=4096m --privileged --gpus all -it --network=host \ -v /workspace:/workspace -v <path to nuscenes>:/data \ vad-trt /bin/bash
To setup the deployment environment, you may run the following commands.
cd /workspace/DL4AGX/AV-Solutions/vad-trt/export_eval
ln -s /workspace/VAD/data data # create a soft-link to the data folder
export PYTHONPATH=.:/workspace/VAD
As VAD
is a temporal model, the inference behavior is different between the first frame and the subsequent frames.
When one frame has its previous frame, it will first do a temporal warp then concat the warped feature map to the current one.
To deploy the ONNX of the first frame, you may run
python export_no_prev.py /workspace/VAD/projects/configs/VAD/VAD_tiny_stage_2.py /workspace/VAD/ckpts/VAD_tiny.pth --launcher none --eval bbox --tmpdir tmp
To deploy the ONNX of the subsequent frames, you may run
python export_prev.py /workspace/VAD/projects/configs/VAD/VAD_tiny_stage_2.py /workspace/VAD/ckpts/VAD_tiny.pth --launcher none --eval bbox --tmpdir tmp
After these two command lines, you are expected to see vadv1.extract_img_feat
, vadv1.pts_bbox_head.forward
, vadv1_prev.pts_bbox_head.forward
under /workspace/DL4AGX/AV-Solutions/vad-trt/export_eval/scratch
. Each folder contains dumped input and output tensors in binary format, and an ONNX file begin with sim_
.
We provide test_tensorrt.py
to run benchmark with TensorRT. It will produce similar result as the original benchmark with pytorch.
- To prepare dependencies for benchmark:
pip install pycuda numpy==1.23
pip install <TensorRT Root>/python/tensorrt-<version>-cp38-none-linux_aarch64.whl
- Build plugins for benchmark
export TRT_ROOT=<path to your tensorrt dir>
cd /workspace/dl4agx/AV-Solutions/vad-trt/plugins/
mkdir -p build
cmake ..
make
- Then you need to build tensorrt engine.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<TRT_ROOT>/lib
export PATH=$PATH:<TRT_ROOT>/bin
# build image encoder
trtexec --onnx=scratch/vadv1.extract_img_feat/sim_vadv1.extract_img_feat.onnx \
--staticPlugins=../plugins/build/libplugins.so \
--profilingVerbosity=detailed --dumpProfile \
--separateProfileRun --useSpinWait --useManagedMemory \
--fp16 \
--saveEngine=vadv1.extract_img_feat/vadv1.extract_img_feat.fp16.engine
# build heads
trtexec --onnx=scratch/vadv1.pts_bbox_head.forward/sim_vadv1_prev.pts_bbox_head.forward.onnx \
--staticPlugins=../plugins/build/libplugins.so \
--profilingVerbosity=detailed --dumpProfile \
--separateProfileRun --useSpinWait --useManagedMemory \
--saveEngine=vadv1.pts_bbox_head.forward/vadv1.pts_bbox_head.forward.engine
# build heads with prev_bev
trtexec --onnx=scratch/vadv1_prev.pts_bbox_head.forward/sim_vadv1_prev.pts_bbox_head.forward.onnx \
--staticPlugins=../plugins/build/libplugins.so \
--profilingVerbosity=detailed --dumpProfile \
--separateProfileRun --useSpinWait --useManagedMemory \
--saveEngine=vadv1_prev.pts_bbox_head.forward/vadv1_prev.pts_bbox_head.forward.engine
- Run benchmark with tensorrt
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<TensorRT Root>/lib
python test_tensorrt.py /workspace/VAD/projects/configs/VAD/VAD_tiny_stage_2.py ckpts/VAD_tiny.pth --launcher none --eval bbox --tmpdir tmp
As we replace the backend from pytorch to tensorrt while keeping other parts like data loading and evaluation unchanged, you are expected to see outputs similar to the pytorch benchmark.
This model is to be deployed on NVIDIA DRIVE Orin with TensorRT 8.6.13.3.
We recommend using the following NVIDIA DRIVE docker image drive-agx-orin-linux-aarch64-sdk-build-x86:6.0.10.0-0009
as the cross-compile environment, this container will be referred to as the build container
.
To launch the docker on the host x86 machine, you may run:
docker run --gpus all -it --network=host --rm \
-v /workspace:/workspace \
nvcr.io/drive/driveos-sdk/drive-agx-orin-linux-aarch64-sdk-build-x86:latest
To gain access to the docker image and the corresponding TensorRT, please join the DRIVE AGX SDK Developer Program. You can find more details on NVIDIA Drive site.
Similar to the benchmark section, we run the following command lines inside the build container
to build the plugin for NVIDIA Drive Orin.
# inside cross-compile environment
cd /workspace/dl4agx/AV-Solutions/vad-trt/plugins/
mkdir -p build+orin && cd build+orin
cmake .. -DTARGET=aarch64
make
If everything goes well, you will see libplugins.so
under vad-trt/plugins/build+orin/
NOTE If you encountered with
No CMAKE_CUDA_COMPILER could be found.
, please run the command line below to help cmake locatenvcc
export PATH=$PATH:/usr/local/cuda/bin
Similar to what we did when building plugins, you may run the following commands inside the build container
.
# inside cross-compile environment
cd /workspace/dl4agx/AV-Solutions/vad-trt/app
bash setup_dep.sh # download dependencies (stb, cuOSD)
mkdir -p build+orin && cd build+orin
cmake -DTARGET=aarch64 .. && make
We expected to see vad_app
under vad-trt/app/build+orin/
In this demo run, we will setup everything under folder vad-trt/app/demo
.
- prepare plugin and app
cd /workspace/dl4agx/AV-Solutions/vad-trt/
cp plugins/build+orin/libplugins.so app/demo/
cp app/build+orin/vad_app app/demo
- prepare input data and onnx files
In the
torch container
environment, run
cd /workspace/dl4agx/AV-Solutions/vad-trt/export_eval
python save_data.py /workspace/VAD/projects/configs/VAD/VAD_tiny_stage_2.py /workspace/VAD/ckpts/VAD_tiny.pth --launcher none --eval bbox --tmpdir tmp
This will dump necessary data files to vad-trt/export_eval/demo_data/<numbers>/
.
We can then move them by
cd /workspace/dl4agx/AV-Solutions/vad-trt/
cp -r export_eval/demo_data/ demo/data
cp -r export_eval/scratch/vadv1.extract_img_feat/ demo/onnx_files
cp -r export_eval/scratch/vadv1_prev.pts_bbox_head.forward/ demo/onnx_files
Now the demo
folder should be organized as:
├── config.json
├── data/
│ ├── 1/
│ ├── 2/
│ └── ...
├── libplugins.so
├── onnx_files/
│ ├── vadv1.extract_img_feat
│ ├── vadv1_prev.pts_bbox_head.forward
│ └── vadv1.pts_bbox_head.forward
├── engines/
├── simhei.ttf
└── viz/
You may utilize trtexec
to build the engine from the onnx file on NVIDIA Drive Orin.
cd AV-Solutions/vad-trt/app/demo/
# build image encoder
trtexec --onnx=onnx_files/vadv1.extract_img_feat/sim_vadv1.extract_img_feat.onnx \
--staticPlugins=./libplugins.so \
--profilingVerbosity=detailed --dumpProfile \
--separateProfileRun --useSpinWait --useManagedMemory \
--fp16 \
--saveEngine=engines/vadv1.extract_img_feat.fp16.engine
# build heads
trtexec --onnx=onnx_files/vadv1_prev.pts_bbox_head.forward/sim_vadv1_prev.pts_bbox_head.forward.onnx \
--staticPlugins=./libplugins.so \
--profilingVerbosity=detailed --dumpProfile \
--separateProfileRun --useSpinWait --useManagedMemory \
--saveEngine=engines/vadv1_prev.pts_bbox_head.forward.engine
To run the demo app, just simply call
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<TRT_ROOT>/lib
cd AV-Solutions/vad-trt/app/demo
./vad_app config.json
Then you may find visualize result under vad-trt/app/demo/viz
in jpg format.
Similar to uniad-trt/README.md#results, we only show detection and planning results.
You may find the video demo in demo.webm
- VAD and it's related code was licensed under Apache-2.0
- cuOSD and it's related code was licensed under MIT