Skip to content

Commit

Permalink
Mixtral readme update (#29)
Browse files Browse the repository at this point in the history
* Updated readme for Mixtral

Signed-off-by: quic-amitraj <[email protected]>

* Fixed error and broken lins

Signed-off-by: quic-amitraj <[email protected]>

* Fixed table

Signed-off-by: quic-amitraj <[email protected]>

* Fixed bug

Signed-off-by: quic-amitraj <[email protected]>

* Fixed bug

Signed-off-by: quic-amitraj <[email protected]>

---------

Signed-off-by: quic-amitraj <[email protected]>
  • Loading branch information
quic-amitraj authored May 28, 2024
1 parent 893de86 commit 369f453
Showing 1 changed file with 7 additions and 8 deletions.
15 changes: 7 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@

*Latest news* :fire: <br>

- [coming soon] support for more popular [models](#models-coming-soon) and inference optimization techniques like continuous batching and speculative decoding <br>
- [coming soon] Support for more popular [models](#models-coming-soon) and inference optimization techniques like continuous batching and speculative decoding <br>
- [05/2024] Added support for [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1).
- [04/2024] Initial release of [efficient transformers](https://github.com/quic/efficient-transformers) for seamless inference on pre-trained LLMs.

## Train anywhere, Infer on Qualcomm Cloud AI with a Developer-centric Toolchain
Expand All @@ -37,7 +38,6 @@ For other models, there is comprehensive documentation to inspire upon the chang

## Validated Models


* [GPT2](https://huggingface.co/openai-community/gpt2)
* [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
* [Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)
Expand All @@ -48,10 +48,10 @@ For other models, there is comprehensive documentation to inspire upon the chang
* [Salesforce/xgen-7b-8k-base](https://huggingface.co/Salesforce/xgen-7b-8k-base)
* [MPT-7b](https://huggingface.co/mosaicml/mpt-7b)
* [Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
* [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)


**Models Coming Soon..**
* [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
## Models Coming Soon

* [Falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
* [Starcoder2-15b](https://huggingface.co/bigcode/starcoder2-15b)
* [Phi-3](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)
Expand Down Expand Up @@ -110,18 +110,17 @@ In summary:

| High Level APIs | Sample use | Arguments |
|-----------------|------------|-------------------|

| QEfficient.cloud.infer | [click here](#1-use-qefficientcloudinfer) | <li>model_name : $\color{green} {Mandatory}$</li> <li>num_cores : $\color{green} {Mandatory}$</li> <li>device_group : $\color{green} {Mandatory}$</li><li>batch_size : Optional [Default-1]</li> <li>prompt_len : Optional [Default-32]</li> <li>ctx_len : Optional [Default-128]</li><li>mxfp6 : Optional </li> <li>mxint8 : Optional </li><li>hf_token : Optional </li><li>cache_dir : Optional ["cache_dir" in current working directory]</li><li>**prompt : Optional</li><li>**prompts_txt_file_path : Optional</li>|
| QEfficient.cloud.execute | [click here](#2-use-of-qefficientcloudexcute) | <li>model_name : $\color{green} {Mandatory}$</li> <li>device_group : $\color{green} {Mandatory}$</li><li>qpc_path : $\color{green} {Mandatory}$</li><li>prompt : Optional [Default-"My name is"]</li> <li>cache_dir : Optional ["cache_dir" in current working directory]</li><li>hf_token : Optional </li><li>**prompt : Optional</li><li>**prompts_txt_file_path : Optional</li> |

**One argument, prompt or prompts_txt_file_path must be passed.
**One argument, prompt or prompts_txt_file_path must be passed.**

### 1. Use QEfficient.cloud.infer

This is the single e2e python api in the library, which takes model_card name as input along with other compile args if necessary and does everything in one go.

* Torch Download → Optimize for Cloud AI 100 → Export to ONNX → Verify (CPU) → Compile on Cloud AI 100 → [Execute](#2-use-of-qefficientcloudexecute)
* Its skips the ONNX export/compile stage if ONNX file or qpc found on path
* It skips the ONNX export/compile stage if ONNX file or qpc found on path


```bash
Expand Down

0 comments on commit 369f453

Please sign in to comment.