Skip to content

Commit

Permalink
updated quick_docs
Browse files Browse the repository at this point in the history
Signed-off-by: eplatero <[email protected]>
  • Loading branch information
eplatero97 committed Dec 4, 2024
1 parent 5517a2f commit 2e13c74
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ qeff_model.generate(prompts=["My name is"])
End to End demo examples for various models are available in **notebooks** directory. Please check them out.

### Draft-Based Speculative Decoding
Draft-based speculative decoding is the approach where a small Draft Language Model (DLM) makes `num_speculative_tokens` autoregressive speculations ahead of the Target Language Model (TLM). The objective is to predict what the TLM would have predicted if it would have been used instead of the DLM. This approach is beneficial when the autoregressive decode phase of the TLM is memory bound and thus, we can leverage the extra computing resources of our hardware by batching the speculations of the DLM as an input to TLM to validate the speculations.
Draft-based speculative decoding is a technique where a small Draft Language Model (DLM) makes `num_speculative_tokens` autoregressive speculations ahead of the Target Language Model (TLM). The objective is to predict what the TLM would have predicted if it would have been used instead of the DLM. This approach is beneficial when the autoregressive decode phase of the TLM is memory bound and thus, we can leverage the extra computing resources of our hardware by batching the speculations of the DLM as an input to TLM to validate the speculations.

To export and compile both DLM/TLM, add corresponding `is_tlm` and `num_speculative_tokens` for TLM and export DLM as you would any other QEfficient LLM model:

Expand Down

0 comments on commit 2e13c74

Please sign in to comment.