Skip to content

Commit

Permalink
Merge branch 'main' into dev
Browse files Browse the repository at this point in the history
  • Loading branch information
Anindyadeep authored Apr 29, 2024
2 parents 2c2cddc + 80bba57 commit 178f317
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 4 deletions.
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
</ol>
</details>


## 🥽 Quick glance towards performance benchmarks

Take a first glance at [Mistral 7B v0.1 Instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [Llama 2 7B Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) Performance Metrics Across Different Precision and Inference Engines. Here is our run specification that generated this performance benchmark reports.
Expand Down
6 changes: 3 additions & 3 deletions docs/llama2.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
**Environment:**
- Model: Llama 2 7B Chat
- CUDA Version: 12.1
- Command: `./benchmark.sh --repetitions 10 --max_tokens 512 --device cuda --prompt 'Write an essay about the transformer model architecture'`
- Command: `./benchmark.sh --repetitions 10 --max_tokens 512 --device cuda --model llama --prompt 'Write an essay about the transformer model architecture'`

**Performance Metrics:** (unit: Tokens / second)

Expand All @@ -27,7 +27,7 @@
| [Nvidia TensorRT-LLM](/bench_tensorrtllm/) | 55.19 ± 1.03 | 85.03 ± 0.62 | 167.66 ± 2.05 | 235.18 ± 3.20 |


*(Data updated: `17th April 2024`)
*(Data updated: `29th April 2024`)


## M2 MAX 32GB Inference Bench:
Expand Down Expand Up @@ -58,4 +58,4 @@
| [llama.cpp](/bench_llamacpp/) | - | - | 30.11 ± 0.45 | 44.27 ± 0.12 |
| [ctransformers](/bench_ctransformers/) | - | - | 20.75 ± 0.36 | 34.04 ± 2.11 |

*(Data updated: `17th April 2024`)
*(Data updated: `29th April 2024`)

0 comments on commit 178f317

Please sign in to comment.