diff --git a/notebooks/254-llm-chatbot/254-rag-chatbot.ipynb b/notebooks/254-llm-chatbot/254-rag-chatbot.ipynb
index feb37f8404b..6bd280a2cf0 100644
--- a/notebooks/254-llm-chatbot/254-rag-chatbot.ipynb
+++ b/notebooks/254-llm-chatbot/254-rag-chatbot.ipynb
@@ -83,7 +83,7 @@
     "\"accelerate\"\\\n",
     "\"openvino-nightly\"\\\n",
     "\"gradio\"\\\n",
-    "\"onnx\" \"chromadb\" \"sentence_transformers\" \"langchain>=0.1.7\" \"langchainhub\" \"transformers>=4.37.0\" \"unstructured\" \"scikit-learn\" \"python-docx\" \"pdfminer.six\" \"bitsandbytes\""
+    "\"onnx\" \"einops\" \"transformers_stream_generator\" \"tiktoken\" \"transformers>=4.38.1\" \"bitsandbytes\" \"chromadb\" \"sentence_transformers\" \"langchain>=0.1.7\" \"langchainhub\" \"unstructured\" \"scikit-learn\" \"python-docx\" \"pdfminer.six\""
    ]
   },
   {
@@ -122,7 +122,6 @@
     "    except OSError:\n",
     "        notebook_login()\n",
     "```\n",
-    "*  **mini-cpm-2b-dpo** - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. After Direct Preference Optimization (DPO) fine-tuning, MiniCPM outperforms many popular 7b, 13b and 70b models. More details can be found in [model_card](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp16).\n",
     "* **red-pajama-3b-chat** - A 2.8B parameter pre-trained language model based on GPT-NEOX architecture. It was developed by Together Computer and leaders from the open-source AI community. The model is fine-tuned on OASST1 and Dolly2 datasets to enhance chatting ability. More details about model can be found in [HuggingFace model card](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1).\n",
     "*  **gemma-7b-it** - Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. This model is instruction-tuned version of 7B parameters model. More details about model can be found in [model card](https://huggingface.co/google/gemma-7b-it).\n",
     ">**Note**: run model with demo, you will need to accept license agreement. \n",
@@ -156,7 +155,7 @@
     "    except OSError:\n",
     "        notebook_login()\n",
     "```\n",
-    "* **qwen1.5-0.5b-chat/qwen1.5-1.8b-chat/qwen1.5-7b-chat** - Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. Qwen1.5 is a language model series including decoder language models of different model sizes. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention. You can find more details about model in the [model repository](https://huggingface.co/Qwen).\n",
+    "* **qwen1.5-1.8b-chat/qwen1.5-7b-chat** - Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. Qwen1.5 is a language model series including decoder language models of different model sizes. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention. You can find more details about model in the [model repository](https://huggingface.co/Qwen).\n",
     "* **qwen-7b-chat** - Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. For more details about Qwen, please refer to the [GitHub](https://github.com/QwenLM/Qwen) code repository.\n",
     "* **mpt-7b-chat** - MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference. These architectural changes include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention with Linear Biases ([ALiBi](https://arxiv.org/abs/2108.12409)). Thanks to these modifications, MPT models can be trained with high throughput efficiency and stable convergence. MPT-7B-chat is a chatbot-like model for dialogue generation. It was built by finetuning MPT-7B on the [ShareGPT-Vicuna](https://huggingface.co/datasets/jeffwan/sharegpt_vicuna), [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3), [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), and [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k) datasets. More details about the model can be found in [blog post](https://www.mosaicml.com/blog/mpt-7b), [repository](https://github.com/mosaicml/llm-foundry/) and [HuggingFace model card](https://huggingface.co/mosaicml/mpt-7b-chat).\n",
     "* **chatglm3-6b** - ChatGLM3-6B is the latest open-source model in the ChatGLM series. While retaining many excellent features such as smooth dialogue and low deployment threshold from the previous two generations, ChatGLM3-6B employs a more diverse training dataset, more sufficient training steps, and a more reasonable training strategy. ChatGLM3-6B adopts a newly designed [Prompt format](https://github.com/THUDM/ChatGLM3/blob/main/PROMPT_en.md), in addition to the normal multi-turn dialogue. You can find more details about model in the [model card](https://huggingface.co/THUDM/chatglm3-6b)\n",
@@ -165,8 +164,7 @@
     "* **neural-chat-7b-v3-1** - Mistral-7b model fine-tuned using Intel Gaudi. The model fine-tuned on the open source dataset [Open-Orca/SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) and aligned with [Direct Preference Optimization (DPO) algorithm](https://arxiv.org/abs/2305.18290). More details can be found in [model card](https://huggingface.co/Intel/neural-chat-7b-v3-1) and [blog post](https://medium.com/@NeuralCompressor/the-practice-of-supervised-finetuning-and-direct-preference-optimization-on-habana-gaudi2-a1197d8a3cd3).\n",
     "* **notus-7b-v1** - Notus is a collection of fine-tuned models using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). and related [RLHF](https://huggingface.co/blog/rlhf) techniques. This model is the first version, fine-tuned with DPO over zephyr-7b-sft. Following a data-first approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO. Proposed approach for dataset creation helps to effectively fine-tune Notus-7b that surpasses Zephyr-7B-beta and Claude 2 on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). More details about model can be found in [model card](https://huggingface.co/argilla/notus-7b-v1).\n",
     "* **youri-7b-chat** - Youri-7b-chat is a Llama2 based model. [Rinna Co., Ltd.](https://rinna.co.jp/) conducted further pre-training for the Llama2 model with a mixture of English and Japanese datasets to improve Japanese task capability. The model is publicly released on Hugging Face hub. You can find detailed information at the [rinna/youri-7b-chat project page](https://huggingface.co/rinna/youri-7b). \n",
-    "* **baichuan2-7b-chat** - Baichuan 2 is the new generation of large-scale open-source language models launched by [Baichuan Intelligence inc](https://www.baichuan-ai.com/home). It is trained on a high-quality corpus with 2.6 trillion tokens and has achieved the best performance in authoritative Chinese and English benchmarks of the same size.\n",
-    "* **internlm2-chat-1.8b** - InternLM2 is the second generation InternLM series. Compared to the previous generation model, it shows significant improvements in various capabilities, including reasoning, mathematics, and coding. More details about model can be found in [model repository](https://huggingface.co/internlm)."
+    "* **baichuan2-7b-chat** - Baichuan 2 is the new generation of large-scale open-source language models launched by [Baichuan Intelligence inc](https://www.baichuan-ai.com/home). It is trained on a high-quality corpus with 2.6 trillion tokens and has achieved the best performance in authoritative Chinese and English benchmarks of the same size."
    ]
   },
   {
@@ -186,15 +184,15 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "2024-03-06 07:05:19.617312: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
-      "2024-03-06 07:05:19.620814: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n",
-      "2024-03-06 07:05:19.663621: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
-      "2024-03-06 07:05:19.663653: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
-      "2024-03-06 07:05:19.663683: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
-      "2024-03-06 07:05:19.671963: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n",
-      "2024-03-06 07:05:19.673938: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
+      "2024-03-07 23:06:47.788169: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
+      "2024-03-07 23:06:47.791855: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n",
+      "2024-03-07 23:06:47.834258: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
+      "2024-03-07 23:06:47.834288: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
+      "2024-03-07 23:06:47.834330: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
+      "2024-03-07 23:06:47.842773: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n",
+      "2024-03-07 23:06:47.844036: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
       "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
-      "2024-03-06 07:05:20.726709: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n"
+      "2024-03-07 23:06:48.759435: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n"
      ]
     }
    ],
@@ -242,7 +240,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "5875d10008c442c38ff1d90da874b8dc",
+       "model_id": "8f481cf6f5af459495323c305c9f2b14",
        "version_major": 2,
        "version_minor": 0
       },
@@ -272,32 +270,32 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 31,
    "id": "184d1678-0e73-4f35-8af5-1a7d291c2e6e",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "c8d393ddf227409d84313cde097d9896",
+       "model_id": "815948450b6b441cb33407d6e888ec84",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
-       "Dropdown(description='Model:', options=('tiny-llama-1b-chat', 'gemma-2b-it', 'red-pajama-3b-chat', 'gemma-7b-i…"
+       "Dropdown(description='Model:', index=2, options=('qwen1.5-1.8b-chat', 'qwen1.5-7b-chat', 'qwen-7b-chat', 'chat…"
       ]
      },
-     "execution_count": 15,
+     "execution_count": 31,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "llm_model_ids = list(SUPPORTED_LLM_MODELS[model_language.value])\n",
+    "llm_model_ids = [model_id for model_id, model_config in SUPPORTED_LLM_MODELS[model_language.value].items() if model_config.get(\"rag_prompt_template\")]\n",
     "\n",
     "llm_model_id = widgets.Dropdown(\n",
     "    options=llm_model_ids,\n",
-    "    value=llm_model_ids[4],\n",
+    "    value=llm_model_ids[-1],\n",
     "    description=\"Model:\",\n",
     "    disabled=False,\n",
     ")\n",
@@ -307,7 +305,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 11,
    "id": "49ea95f8",
    "metadata": {},
    "outputs": [
@@ -315,7 +313,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Selected LLM model tiny-llama-1b-chat\n"
+      "Selected LLM model internlm2-chat-1.8b\n"
      ]
     }
    ],
@@ -382,14 +380,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 12,
    "id": "c6a38153",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "10a3596a41864effbe8fb9d81723f3ed",
+       "model_id": "4d6fd33052574569bb607d72148933d2",
        "version_major": 2,
        "version_minor": 0
       },
@@ -403,7 +401,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "da04e6b87e41474194e2de8219da7303",
+       "model_id": "52cae9ef3fca42029949c79ce046b52e",
        "version_major": 2,
        "version_minor": 0
       },
@@ -417,7 +415,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "0532ba4230d440aeb3f10cd7becf9156",
+       "model_id": "2dfd7e4fd16f4e0785929d8f3fceb088",
        "version_major": 2,
        "version_minor": 0
       },
@@ -455,7 +453,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 14,
    "id": "2020d522",
    "metadata": {},
    "outputs": [],
@@ -654,7 +652,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 15,
    "id": "8e127215",
    "metadata": {},
    "outputs": [
@@ -662,7 +660,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Size of model with INT4 compressed weights is 1837.58 MB\n"
+      "Size of FP16 model is 1819.91 MB\n"
      ]
     }
    ],
@@ -698,22 +696,22 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": 16,
    "id": "ff80e6eb-7923-40ef-93d8-5e6c56e50667",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "d7e6f5925ad0446ca94e882a8c6503fc",
+       "model_id": "10bc2ae1b14d4cb69f0db31fe6133643",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
-       "Dropdown(description='Embedding Model:', options=('all-mpnet-base-v2',), value='all-mpnet-base-v2')"
+       "Dropdown(description='Embedding Model:', options=('all-mpnet-base-v2', 'text2vec-large-chinese'), value='all-m…"
       ]
      },
-     "execution_count": 17,
+     "execution_count": 16,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -736,7 +734,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 17,
    "id": "790afcf8",
    "metadata": {},
    "outputs": [
@@ -755,7 +753,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 18,
    "id": "58d75dad-2eeb-4edd-8d12-d77a365f8eda",
    "metadata": {
     "scrolled": true
@@ -789,14 +787,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 19,
    "id": "e11e73cf",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "ee9a2eefa59e420693ba647d8d5b70c6",
+       "model_id": "d92f5255fdad4d3cbd056e52e99f5ee9",
        "version_major": 2,
        "version_minor": 0
       },
@@ -804,7 +802,7 @@
        "Dropdown(description='Device:', options=('CPU', 'AUTO'), value='CPU')"
       ]
      },
-     "execution_count": 11,
+     "execution_count": 19,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -823,7 +821,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 20,
    "id": "9ab29b85",
    "metadata": {},
    "outputs": [
@@ -851,14 +849,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 21,
    "id": "6d044d01",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "b31a332c59b847269e7a395d34319ad6",
+       "model_id": "21accfb73c1a45b19b331ef2159dcd81",
        "version_major": 2,
        "version_minor": 0
       },
@@ -866,7 +864,7 @@
        "Dropdown(description='Device:', options=('CPU', 'AUTO'), value='CPU')"
       ]
      },
-     "execution_count": 13,
+     "execution_count": 21,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -884,7 +882,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 22,
    "id": "348b90fe",
    "metadata": {},
    "outputs": [
@@ -924,7 +922,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 23,
    "id": "df3e8fd1-d4c1-4e33-b46e-7840e392f8ee",
    "metadata": {},
    "outputs": [],
@@ -965,7 +963,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 24,
    "id": "efe29701",
    "metadata": {},
    "outputs": [],
@@ -975,14 +973,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": 25,
    "id": "8b014f24-aa5b-4d40-924d-d579ad7fcec6",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "d40ce9ed20ac455ab3a92366690955a1",
+       "model_id": "3138dedc2102456db478482b1260c704",
        "version_major": 2,
        "version_minor": 0
       },
@@ -990,7 +988,7 @@
        "Dropdown(description='Model to run:', options=('FP16',), value='FP16')"
       ]
      },
-     "execution_count": 17,
+     "execution_count": 25,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -1016,7 +1014,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 18,
+   "execution_count": 26,
    "id": "f7f708db-8de1-4efd-94b2-fcabc48d52f4",
    "metadata": {},
    "outputs": [
@@ -1024,81 +1022,13 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Loading model from chatglm3-6b/FP16\n"
+      "Loading model from internlm2-chat-1.8b/FP16\n"
      ]
     },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "fcdbf25a78d84edaaf992eef0ff48814",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "tokenizer_config.json:   0%|          | 0.00/1.41k [00:00<?, ?B/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "12cbca021f014bb6900c86cbe0fc5e85",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "tokenization_chatglm.py:   0%|          | 0.00/13.0k [00:00<?, ?B/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "A new version of the following files was downloaded from https://huggingface.co/THUDM/chatglm3-6b:\n",
-      "- tokenization_chatglm.py\n",
-      ". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n"
-     ]
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "11924e2e57bc4e59a95fa0d29fd0f0b5",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "tokenizer.model:   0%|          | 0.00/1.02M [00:00<?, ?B/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "1d72858b6165472fad6c29bacc2ea49c",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "special_tokens_map.json:   0%|          | 0.00/3.00 [00:00<?, ?B/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Setting eos_token is not supported, use the default one.\n",
-      "Setting pad_token is not supported, use the default one.\n",
-      "Setting unk_token is not supported, use the default one.\n",
       "The argument `trust_remote_code` is to be used along with export=True. It will be ignored.\n",
       "Compiling the model to CPU ...\n"
      ]
@@ -1194,7 +1124,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 31,
+   "execution_count": 27,
    "id": "5b97eeeb",
    "metadata": {},
    "outputs": [],
@@ -1601,7 +1531,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 29,
    "id": "6f4b5a84-bebf-49b9-b2fa-5e788ed2cbac",
    "metadata": {},
    "outputs": [
@@ -1609,7 +1539,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Closing server running on port: 5579\n"
+      "Closing server running on port: 4545\n"
      ]
     }
    ],
@@ -1635,7 +1565,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.10.12"
   },
   "openvino_notebooks": {
    "imageUrl": "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/304aa048-f10c-41c6-bb31-6d2bfdf49cf5",
diff --git a/notebooks/254-llm-chatbot/config.py b/notebooks/254-llm-chatbot/config.py
index 87eb36ffb66..802e506233f 100644
--- a/notebooks/254-llm-chatbot/config.py
+++ b/notebooks/254-llm-chatbot/config.py
@@ -157,20 +157,6 @@ def internlm_partial_text_processor(partial_text, new_text):
             Answer: </s>
             <|assistant|>""",
         },
-        "neural-chat-7b-v3-1": {
-            "model_id": "Intel/neural-chat-7b-v3-3",
-            "remote": False,
-            "start_message": f"<s>[INST] <<SYS>>\n{DEFAULT_SYSTEM_PROMPT }\n<</SYS>>\n\n",
-            "history_template": "{user}[/INST]{assistant}</s><s>[INST]",
-            "current_message_template": "{user} [/INST]{assistant}",
-            "tokenizer_kwargs": {"add_special_tokens": False},
-            "partial_text_processor": llama_partial_text_processor,
-            "rag_prompt_template": f"""<s> [INST] {DEFAULT_RAG_PROMPT } [/INST] </s>"""
-            + """
-            [INST] Question: {question} 
-            Context: {context} 
-            Answer: [/INST]""",
-        },
         "notus-7b-v1": {
             "model_id": "argilla/notus-7b-v1",
             "remote": False,
@@ -185,6 +171,20 @@ def internlm_partial_text_processor(partial_text, new_text):
             Answer: </s>
             <|assistant|>""",
         },
+        "neural-chat-7b-v3-1": {
+            "model_id": "Intel/neural-chat-7b-v3-3",
+            "remote": False,
+            "start_message": f"<s>[INST] <<SYS>>\n{DEFAULT_SYSTEM_PROMPT }\n<</SYS>>\n\n",
+            "history_template": "{user}[/INST]{assistant}</s><s>[INST]",
+            "current_message_template": "{user} [/INST]{assistant}",
+            "tokenizer_kwargs": {"add_special_tokens": False},
+            "partial_text_processor": llama_partial_text_processor,
+            "rag_prompt_template": f"""<s> [INST] {DEFAULT_RAG_PROMPT } [/INST] </s>"""
+            + """
+            [INST] Question: {question} 
+            Context: {context} 
+            Answer: [/INST]""",
+        },
     },
     "Chinese":{
         "qwen1.5-0.5b-chat": {
@@ -192,26 +192,6 @@ def internlm_partial_text_processor(partial_text, new_text):
             "remote": False,
             "start_message": DEFAULT_SYSTEM_PROMPT_CHINESE,
             "stop_tokens": ["<|im_end|>", "<|endoftext|>"],
-            "rag_prompt_template": f"""<|im_start|>system
-            {DEFAULT_RAG_PROMPT_CHINESE }<|im_end|>"""
-            + """
-            <|im_start|>user
-            问题: {question} 
-            已知内容: {context} 
-            回答: <|im_end|><|im_start|>assistant""",
-        },
-        "qwen1.5-1.8b-chat": {
-            "model_id": "Qwen/Qwen-1_8B-Chat",
-            "remote": False,
-            "start_message": DEFAULT_SYSTEM_PROMPT_CHINESE,
-            "stop_tokens": ["<|im_end|>", "<|endoftext|>"],
-            "rag_prompt_template": f"""<|im_start|>system
-            {DEFAULT_RAG_PROMPT_CHINESE }<|im_end|>"""
-            + """
-            <|im_start|>user
-            问题: {question} 
-            已知内容: {context} 
-            回答: <|im_end|><|im_start|>assistant""",
         },
         "qwen1.5-7b-chat": {
             "model_id": "Qwen/Qwen1.5-7B-Chat",
@@ -274,12 +254,6 @@ def internlm_partial_text_processor(partial_text, new_text):
             "remote": False,
             "start_message": DEFAULT_SYSTEM_PROMPT_CHINESE,
             "stop_tokens": [2],
-            "rag_prompt_template": f"""{DEFAULT_RAG_PROMPT_CHINESE }"""
-            + """
-            问题: {question} 
-            已知内容: {context} 
-            回答: 
-            """,
         },
         "internlm2-chat-1.8b": {
             "model_id": "internlm/internlm2-chat-1_8b",
@@ -288,6 +262,12 @@ def internlm_partial_text_processor(partial_text, new_text):
             "start_message": DEFAULT_SYSTEM_PROMPT_CHINESE,
             "stop_tokens": [2, 92542],
             "partial_text_processor": internlm_partial_text_processor,
+        },  
+        "qwen1.5-1.8b-chat": {
+            "model_id": "Qwen/Qwen1.5-1.8B-Chat",
+            "remote": False,
+            "start_message": DEFAULT_SYSTEM_PROMPT_CHINESE,
+            "stop_tokens": ["<|im_end|>", "<|endoftext|>"],
             "rag_prompt_template": f"""<|im_start|>system
             {DEFAULT_RAG_PROMPT_CHINESE }<|im_end|>"""
             + """
@@ -295,7 +275,7 @@ def internlm_partial_text_processor(partial_text, new_text):
             问题: {question} 
             已知内容: {context} 
             回答: <|im_end|><|im_start|>assistant""",
-        },  
+        },
     },
     "Japanese":{
         "youri-7b-chat": {
diff --git a/notebooks/254-llm-chatbot/ov_llm_model.py b/notebooks/254-llm-chatbot/ov_llm_model.py
index 9b7444edc5b..10760dd79dd 100644
--- a/notebooks/254-llm-chatbot/ov_llm_model.py
+++ b/notebooks/254-llm-chatbot/ov_llm_model.py
@@ -208,25 +208,6 @@ class OVCHATGLMModel(OVModelForCausalLM):
     """
     Optimum intel compatible model wrapper for CHATGLM2
     """
-
-    def __init__(
-        self,
-        model: "Model",
-        config: "PretrainedConfig" = None,
-        device: str = "CPU",
-        dynamic_shapes: bool = True,
-        ov_config: Optional[Dict[str, str]] = None,
-        model_save_dir: Optional[Union[str, Path]] = None,
-        **kwargs,
-    ):
-        NormalizedConfigManager._conf["chatglm"] = NormalizedTextConfig.with_args(
-            num_layers="num_hidden_layers",
-            num_attention_heads="num_attention_heads",
-            hidden_size="hidden_size",
-        )
-        super().__init__(
-            model, config, device, dynamic_shapes, ov_config, model_save_dir, **kwargs
-        )
     
     def _reshape(self, model: "Model", *args, **kwargs):
         shapes = {}
@@ -243,68 +224,12 @@ def _reshape(self, model: "Model", *args, **kwargs):
                 shapes[inputs][1] = -1
         model.reshape(shapes)
         return model
-
-    @classmethod
-    def _from_pretrained(
-        cls,
-        model_id: Union[str, Path],
-        config: PretrainedConfig,
-        use_auth_token: Optional[Union[bool, str, None]] = None,
-        revision: Optional[Union[str, None]] = None,
-        force_download: bool = False,
-        cache_dir: Optional[str] = None,
-        file_name: Optional[str] = None,
-        subfolder: str = "",
-        from_onnx: bool = False,
-        local_files_only: bool = False,
-        load_in_8bit: bool = False,
-        **kwargs,
-    ):
-        model_path = Path(model_id)
-        default_file_name = OV_XML_FILE_NAME
-        file_name = file_name or default_file_name
-
-        model_cache_path = cls._cached_file(
-            model_path=model_path,
-            use_auth_token=use_auth_token,
-            revision=revision,
-            force_download=force_download,
-            cache_dir=cache_dir,
-            file_name=file_name,
-            subfolder=subfolder,
-            local_files_only=local_files_only,
-        )
-
-        model = cls.load_model(model_cache_path, load_in_8bit=load_in_8bit)
-        init_cls = OVCHATGLMModel
-
-        return init_cls(
-            model=model, config=config, model_save_dir=model_cache_path.parent, **kwargs
-        )
         
         
 class OVQWENModel(OVModelForCausalLM):
     """
     Optimum intel compatible model wrapper for QWEN
     """
-    def __init__(
-        self,
-        model: "Model",
-        config: "PretrainedConfig" = None,
-        device: str = "CPU",
-        dynamic_shapes: bool = True,
-        ov_config: Optional[Dict[str, str]] = None,
-        model_save_dir: Optional[Union[str, Path]] = None,
-        **kwargs,
-    ):
-        NormalizedConfigManager._conf["qwen"] = NormalizedTextConfig.with_args(
-            num_layers="num_hidden_layers",
-            num_attention_heads="num_attention_heads",
-            hidden_size="hidden_size",
-        )
-        super().__init__(
-            model, config, device, dynamic_shapes, ov_config, model_save_dir, **kwargs
-        )
         
     def _reshape(self, model: "Model", *args, **kwargs):
         shapes = {}