-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
57 changed files
with
1,890 additions
and
6,083 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,6 +3,8 @@ name: pre-commit | |
on: | ||
pull_request: | ||
branches: [main] | ||
push: | ||
branches: [main] | ||
|
||
jobs: | ||
pre-commit: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
name: Update README | ||
|
||
on: | ||
push: | ||
branches: ["main"] | ||
paths: | ||
- README.md.template | ||
|
||
jobs: | ||
update-readme: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Checkout Code Repository | ||
uses: actions/checkout@v3 | ||
|
||
- name: Update README | ||
run: sed "s|<LAST_UPDATE>|$(date -u +"%dth %B %Y")|g" README.md.template > README.md | ||
|
||
- name: Commit changes | ||
run: | | ||
git config --global user.email "[email protected]" | ||
git config --global user.name "GitHub Actions" | ||
git add README.md | ||
git commit -m "Update <LAST_UPDATE> placeholder in README.md" || true | ||
- name: Push changes | ||
uses: ad-m/github-push-action@master | ||
with: | ||
github_token: ${{ secrets.GITHUB_TOKEN }} | ||
branch: ${{ github.ref }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
# benchmarks | ||
MLOps Engines, Frameworks, and Languages benchmarks over main stream AI Models. | ||
|
||
## Structure | ||
|
||
The repository is organized to facilitate benchmark management and execution through a consistent structure: | ||
|
||
- Each benchmark, identified as `bench_name`, has a dedicated folder, `bench_{bench_name}`. | ||
- Within these benchmark folders, a common script named `bench.sh` handles setup, environment configuration, and execution. | ||
|
||
### Benchmark Script | ||
|
||
The `bench.sh` script supports key parameters: | ||
|
||
- `prompt`: Benchmark-specific prompt. | ||
- `max_tokens`: Maximum tokens for the benchmark. | ||
- `repetitions`: Number of benchmark repetitions. | ||
- `log_file`: File for storing benchmark logs. | ||
- `device`: Device for benchmark execution (cpu, cuda, metal). | ||
- `models_dir`: Directory containing necessary model files. | ||
|
||
### Unified Execution | ||
|
||
An overarching `bench.sh` script streamlines benchmark execution: | ||
|
||
- Downloads essential files for benchmarking. | ||
- Iterates through all benchmark folders in the repository. | ||
|
||
This empowers users to seamlessly execute benchmarks based on their preference. To run a specific benchmark, navigate to the corresponding benchmark folder (e.g., `bench_{bench_name}`) and execute the `bench.sh` script with the required parameters. | ||
|
||
|
||
|
||
## Usage | ||
|
||
```bash | ||
# Run a specific benchmark | ||
./bench_{bench_name}/bench.sh --prompt <value> --max_tokens <value> --num_repetitions <value> --log_file <file_path> --device <cpu/cuda/metal> --models_dir <path_to_models> | ||
|
||
# Run all benchmarks collectively | ||
./bench.sh --prompt <value> --max_tokens <value> --num_repetitions <value> --log_file <file_path> --device <cpu/cuda/metal> --models_dir <path_to_models> | ||
``` | ||
|
||
|
||
## ML Engines: Feature Table | ||
|
||
| Features | pytorch | burn | llama.cpp | candle | tinygrad | onnxruntime | CTranslate2 | | ||
| --------------------------- | ------- | ---- | --------- | ------ | -------- | ----------- | ----------- | | ||
| Inference support | β | β | β | β | β | β | β | | ||
| 16-bit quantization support | β | β | β | β | β | β | β | | ||
| 8-bit quantization support | β | β | β | β | β | β | β | | ||
| 4-bit quantization support | β | β | β | β | β | β | β | | ||
| 2/3bit quantization support | β | β | β | β | β | β | β | | ||
| CUDA support | β | β | β | β | β | β | β | | ||
| ROCM support | β | β | β | β | β | β | β | | ||
| Intel OneAPI/SYCL support | β ** | β | β | β | β | β | β | | ||
| Mac M1/M2 support | β | β | β | β | β | β | β | | ||
| BLAS support(CPU) | β | β | β | β | β | β | β | | ||
| Model Parallel support | β | β | β | β | β | β | β | | ||
| Tensor Parallel support | β | β | β | β | β | β | β | | ||
| Onnx Format support | β | β | β | β | β | β | β | | ||
| Training support | β | π | β | π | β | β | β | | ||
|
||
β = No Metal Support | ||
π = Partial Support for Training (Finetuning already works, but training from scratch may not work) | ||
|
||
## Benchmarking ML Engines | ||
|
||
### A100 80GB Inference Bench: | ||
|
||
Model: LLAMA-2-7B | ||
|
||
CUDA Version: 11.7 | ||
|
||
Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device gpu --nvidia --prompt 'Explain what is a transformer'` | ||
|
||
| Engine | float32 | float16 | int8 | int4 | | ||
|-------------|--------------|---------------|---------------|---------------| | ||
| burn | 13.12 Β± 0.85 | - | - | - | | ||
| candle | - | 36.78 Β± 2.17 | - | - | | ||
| llama.cpp | - | - | 84.48 Β± 3.76 | 106.76 Β± 1.29 | | ||
| ctranslate | - | 51.38 Β± 16.01 | 36.12 Β± 11.93 | - | | ||
| tinygrad | - | 20.32 Β± 0.06 | - | - | | ||
|
||
*(data updated: <LAST_UPDATE>) | ||
|
||
|
||
### M2 MAX 32GB Inference Bench: | ||
|
||
#### CPU | ||
|
||
Model: LLAMA-2-7B | ||
|
||
CUDA Version: NA | ||
|
||
Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device cpu --prompt 'Explain what is a transformer'` | ||
|
||
| Engine | float32 | float16 | int8 | int4 | | ||
|-------------|--------------|--------------|--------------|--------------| | ||
| burn | 0.30 Β± 0.09 | - | - | - | | ||
| candle | - | 3.43 Β± 0.02 | - | - | | ||
| llama.cpp | - | - | 14.41 Β± 1.59 | 20.96 Β± 1.94 | | ||
| ctranslate | - | - | 2.11 Β± 0.73 | - | | ||
| tinygrad | - | 4.21 Β± 0.38 | - | - | | ||
|
||
#### GPU (Metal) | ||
|
||
Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device gpu --prompt 'Explain what is a transformer'` | ||
|
||
| Engine | float32 | float16 | int8 | int4 | | ||
|-------------|--------------|--------------|--------------|--------------| | ||
| burn | - | - | - | - | | ||
| candle | - | - | - | - | | ||
| llama.cpp | - | - | 31.24 Β± 7.82 | 46.75 Β± 9.55 | | ||
| ctranslate | - | - | - | - | | ||
| tinygrad | - | 29.78 Β± 1.18 | - | - | | ||
|
||
*(data updated: <LAST_UPDATE>) |
Oops, something went wrong.