Merge branch 'main' into dev

premAI-io · Apr 29, 2024 · 178f317 · 178f317
2 parents 2c2cddc + 80bba57
commit 178f317
Show file tree

Hide file tree

Showing 2 changed files with 3 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -22,7 +22,6 @@
  </ol>
 </details>
 
-
 ## 🥽 Quick glance towards performance benchmarks
 
 Take a first glance at [Mistral 7B v0.1 Instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [Llama 2 7B Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) Performance Metrics Across Different Precision and Inference Engines. Here is our run specification that generated this performance benchmark reports.

diff --git a/docs/llama2.md b/docs/llama2.md
@@ -5,7 +5,7 @@
 **Environment:**
 - Model: Llama 2 7B Chat
 - CUDA Version: 12.1
-- Command: `./benchmark.sh --repetitions 10 --max_tokens 512 --device cuda --prompt 'Write an essay about the transformer model architecture'`
+- Command: `./benchmark.sh --repetitions 10 --max_tokens 512 --device cuda --model llama --prompt 'Write an essay about the transformer model architecture'`
 
 **Performance Metrics:** (unit: Tokens / second)
 
@@ -27,7 +27,7 @@
 | [Nvidia TensorRT-LLM](/bench_tensorrtllm/) | 55.19 ± 1.03  | 85.03 ± 0.62  | 167.66 ± 2.05 | 235.18 ± 3.20  |
 
 
-*(Data updated: `17th April 2024`)
+*(Data updated: `29th April 2024`)
 
 
 ## M2 MAX 32GB Inference Bench:
@@ -58,4 +58,4 @@
 | [llama.cpp](/bench_llamacpp/)          | -       | -       | 30.11 ± 0.45 | 44.27 ± 0.12 |
 | [ctransformers](/bench_ctransformers/) | -       | -       | 20.75 ± 0.36 | 34.04 ± 2.11 |
 
-*(Data updated: `17th April 2024`)
+*(Data updated: `29th April 2024`)
-Original file line number
+Diff line change
@@ Expand Up / @@ -22,7 +22,6 @@ @@
      </ol>
     </details>
     ## 🥽 Quick glance towards performance benchmarks
     Take a first glance at [Mistral 7B v0.1 Instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [Llama 2 7B Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) Performance Metrics Across Different Precision and Inference Engines. Here is our run specification that generated this performance benchmark reports.
@@ Expand Down @@