Overiew of the optional performance features that are yet to be upstr…

…eamed
ROCm · Mar 25, 2024 · ea6ed38 · ea6ed38
1 parent a7164ca
commit ea6ed38
Showing 1 changed file with 14 additions and 0 deletions.
diff --git a/ROCm_performance.md b/ROCm_performance.md
@@ -0,0 +1,14 @@
+# Overview of the optional performance features uinque to https://github.com/ROCm/vllm
+## Multi-GPU torchrun
+On ROCm the default multi GPU executor is `torchrun` as opposed to `ray` on NVIDIA  
+This can be overriden by the `--worker-use-ray` flag to vllm or its benchmarks  
+To utilize torchran parallelism, the run command should be midified from  
+`python <command>`  
+to  
+`torchrun --standalone --nnodes=1 --nproc-per-node=<workd-size> <command>`
+## Triton attention
+The default attention function on ROCm is using triton attention kernel. To fallback to the https://github.com/ROCm/flash-attention implementation set up the following environment symbol:  
+`VLLM_USE_FLASH_ATTN_TRITON=False`
+## Tunable ops
+Pytorch tunable ops are supported.  
+Define the following environment symbol: `PYTORCH_TUNABLEOP_ENABLED=1` in order to enable both the runtime tuning and the subsequent use of tuned results. To only use the tuned results without tuning any newly encountered shapes, also define `PYTORCH_TUNABLEOP_TUNING=1`