diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst b/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst index 60253779b0f3dc..8fb6ad27c4232f 100644 --- a/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst +++ b/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst @@ -44,7 +44,7 @@ You select one of the methods by setting the ``--group-size`` parameter to eithe .. code-block:: console :name: group-quant - optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --ratio 1.0 --group_size 128 TinyLlama-1.1B-Chat-v1.0 + optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --ratio 1.0 --group-size 128 TinyLlama-1.1B-Chat-v1.0 .. tab-item:: Channel-wise quantization @@ -63,12 +63,12 @@ You select one of the methods by setting the ``--group-size`` parameter to eithe If you want to improve accuracy, make sure you: 1. Update NNCF: ``pip install nncf==2.13`` - 2. Use ``--scale_estimation --dataset=`` and accuracy aware quantization ``--awq``: + 2. Use ``--scale_estimation --dataset `` and accuracy aware quantization ``--awq``: .. code-block:: console :name: channel-wise-data-aware-quant - optimum-cli export openvino -m meta-llama/Llama-2-7b-chat-hf --weight-format int4 --sym --group-size -1 --ratio 1.0 --awq --scale-estimation --dataset=wikitext2 Llama-2-7b-chat-hf + optimum-cli export openvino -m meta-llama/Llama-2-7b-chat-hf --weight-format int4 --sym --group-size -1 --ratio 1.0 --awq --scale-estimation --dataset wikitext2 Llama-2-7b-chat-hf .. important:: diff --git a/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst b/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst index 046dde9661c3bb..4b752b74187768 100644 --- a/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst +++ b/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst @@ -354,7 +354,7 @@ To find the optimal weight compression parameters for a particular model, refer `example `__ , where weight compression parameters are being searched from the subset of values. To speed up the search, a self-designed validation pipeline called -`WhoWhatBench `__ +`WhoWhatBench `__ is used. The pipeline can quickly evaluate the changes in the accuracy of the optimized model compared to the baseline. @@ -491,7 +491,7 @@ Additional Resources - `OpenVINO GenAI Repo `__ : Repository containing example pipelines that implement image and text generation tasks. It also provides a tool to benchmark LLMs. -- `WhoWhatBench `__ +- `WhoWhatBench `__ - `NNCF GitHub `__ - :doc:`Post-training Quantization ` - :doc:`Training-time Optimization `