Skip to content

Commit

Permalink
[DOCS] fix typo in npu llm inference guide and update wwb link (#28365)
Browse files Browse the repository at this point in the history
### Details:
 - *fixed optimum-cli arguments formatting in NPU llm inference guide*
- *updated link for who_what_benchmark* that was moved to tools
directory sometime ago
  • Loading branch information
eaidova authored Jan 10, 2025
1 parent 65e6ab4 commit 5249b69
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ You select one of the methods by setting the ``--group-size`` parameter to eithe
.. code-block:: console
:name: group-quant
optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --ratio 1.0 --group_size 128 TinyLlama-1.1B-Chat-v1.0
optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --ratio 1.0 --group-size 128 TinyLlama-1.1B-Chat-v1.0
.. tab-item:: Channel-wise quantization

Expand All @@ -63,12 +63,12 @@ You select one of the methods by setting the ``--group-size`` parameter to eithe
If you want to improve accuracy, make sure you:

1. Update NNCF: ``pip install nncf==2.13``
2. Use ``--scale_estimation --dataset=<dataset_name>`` and accuracy aware quantization ``--awq``:
2. Use ``--scale_estimation --dataset <dataset_name>`` and accuracy aware quantization ``--awq``:

.. code-block:: console
:name: channel-wise-data-aware-quant
optimum-cli export openvino -m meta-llama/Llama-2-7b-chat-hf --weight-format int4 --sym --group-size -1 --ratio 1.0 --awq --scale-estimation --dataset=wikitext2 Llama-2-7b-chat-hf
optimum-cli export openvino -m meta-llama/Llama-2-7b-chat-hf --weight-format int4 --sym --group-size -1 --ratio 1.0 --awq --scale-estimation --dataset wikitext2 Llama-2-7b-chat-hf
.. important::
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,7 @@ To find the optimal weight compression parameters for a particular model, refer
`example <https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/openvino/tiny_llama_find_hyperparams>`__ ,
where weight compression parameters are being searched from the subset of values.
To speed up the search, a self-designed validation pipeline called
`WhoWhatBench <https://github.com/openvinotoolkit/openvino.genai/tree/master/llm_bench/python/who_what_benchmark>`__
`WhoWhatBench <https://github.com/openvinotoolkit/openvino.genai/tree/master/tools/who_what_benchmark>`__
is used. The pipeline can quickly evaluate the changes in the accuracy of the optimized
model compared to the baseline.

Expand Down Expand Up @@ -491,7 +491,7 @@ Additional Resources
- `OpenVINO GenAI Repo <https://github.com/openvinotoolkit/openvino.genai>`__
: Repository containing example pipelines that implement image and text generation
tasks. It also provides a tool to benchmark LLMs.
- `WhoWhatBench <https://github.com/openvinotoolkit/openvino.genai/tree/master/llm_bench/python/who_what_benchmark>`__
- `WhoWhatBench <https://github.com/openvinotoolkit/openvino.genai/tree/master/tools/who_what_benchmark>`__
- `NNCF GitHub <https://github.com/openvinotoolkit/nncf>`__
- :doc:`Post-training Quantization <quantizing-models-post-training>`
- :doc:`Training-time Optimization <compressing-models-during-training>`
Expand Down

0 comments on commit 5249b69

Please sign in to comment.