Skip to content

Commit

Permalink
update gradio UI
Browse files Browse the repository at this point in the history
  • Loading branch information
openvino-dev-samples committed Nov 4, 2024
1 parent e09b544 commit 9dfa8c4
Show file tree
Hide file tree
Showing 2 changed files with 58 additions and 23 deletions.
2 changes: 1 addition & 1 deletion notebooks/multimodal-rag/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Constructing a RAG pipeline for text is relatively straightforward, thanks to th

To build a truly multimodal search for videos, you need to work with different modalities of a video like spoken content, visual. In this notebook, we showcase a Multimodal RAG pipeline designed for video analytics. It utilizes Whisper model to convert spoken content to text, CLIP model to generate multimodal embeddings, and Vision Language model (VLM) to process retrieved images and text messages. The following picture illustrates how this pipeline is working.

![Multimodal RAG](https://github.com/user-attachments/assets/baef4914-5c07-432c-9363-1a0cb5944b09)
![image](https://github.com/user-attachments/assets/fb3ec06f-e4b0-4ca3-aac6-71465ae14808)

## Notebook contents
The tutorial consists from following steps:
Expand Down
79 changes: 57 additions & 22 deletions notebooks/multimodal-rag/multimodal-rag-llamaindex.ipynb

Large diffs are not rendered by default.

0 comments on commit 9dfa8c4

Please sign in to comment.