Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readme Updates #1

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 128 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,135 @@ This blueprint is based on [NVIDIA-Ingest](https://github.com/NVIDIA/nv-ingest)

NVIDIA Ingest enables parallel document splitting to rapidly extract data from many documents at the same time.

## Prerequisites

#### Hardware

| GPU | Family | Memory | # of GPUs (min.) |
| ------ | ------ | ------ | ------ |
| H100 | SXM/NVLink or PCIe | 80GB | 2 |
| A100 | SXM/NVLink or PCIe | 80GB | 2 |

## Get Started

1. Apply for [Early Access](https://developer.nvidia.com/nemo-microservices).
2. Follow the getting started documentation [here](https://github.com/NVIDIA/nv-ingest).
2. Once you have the Early Access, login to NGC and download the `Enterprise RAG - Docker workflow` Resoruce and then Follow the below instructions

* Install [Docker Engine and Docker Compose](https://docs.docker.com/engine/install/ubuntu/).

* Verify NVIDIA GPU driver version 535 or later is installed.

```
$ nvidia-smi --query-gpu=driver_version --format=csv,noheader
550.90.07

$ nvidia-smi -q -d compute

==============NVSMI LOG==============

Timestamp : Thus Oct 11 21:17:25 2024
Driver Version : 550.90.07
CUDA Version : 12.4

Attached GPUs : 2
GPU 00000000:CA:00.0
Compute Mode : Default
GPU 00000000:CC:00.0
Compute Mode : Default
```

Refer to the [NVIDIA Linux driver installation instructions](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html) for more information.

* Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).

Verify the toolkit is installed and configured as the default container runtime.


$ cat /etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}

$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi -L
GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-d8ce95c1-12f7-3174-6395-e573163a2ace)
GPU 1: NVIDIA A100 80GB PCIe (UUID: GPU-49902a43-6199-5249-02c6-19515fc0cc56)


* Create an NGC account and API Key. Refer to the [instructions](https://docs.nvidia.com/ngc/gpu-cloud/ngc-overview/index.html) to create an account and generate an NGC API key.

The key is used to download the containers for the model containers for the models mentioned above, which are required by the NV-Ingest microservice.

Export the NGC_API_KEY

```
export NGC_API_KEY=<ngc-api-key>
```

Log in to the NVIDIA container registry using the following command:

```
docker login nvcr.io
```

* Run the pipeline

`NOTE:` The example requires atleast 4xA100 GPUs to deploy all the required models locally. If you are using a AxA100 system, we need ensure that the NIM LLM microservice runs on a dedicated GPU, Follow the steps below for the same
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4xA100?
Also is this conflicting with the HW req stated above?



cd enterprise-rag-docker_v2x.x.x/

a. Uncomment the device_ids key in docker-compose-nim-ms.yaml for nemollm-inference and comment the count key.

services:
nemollm-inference:
deploy:
resources:
reservations:
devices:
- driver: nvidia
# count: ${INFERENCE_GPU_COUNT:-all}
device_ids: ['${LLM_MS_GPU_ID:-0}']
capabilities: [gpu]

b. And then set the following environment variable.

export LLM_MS_GPU_ID=3

c. Run the below commands to launch the pipeline

cd rag-app-multimodal-chatbot-nvingest/

USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d

* Check status of the containers.

```
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"

CONTAINER ID NAMES STATUS
3c826374c81e rag-playground Up 5 hours
bd49b820e141 rag-application-multimodal-chatbot-nvingest Up 5 hours
33e0dff2dcab milvus-standalone Up 5 hours
1cc2502644db rag-app-multimodal-chatbot-nvingest-cached-1 Up 5 hours
674352dbc275 rag-app-multimodal-chatbot-nvingest-yolox-1 Up 5 hours
4ccfa1c34489 nemo-retriever-embedding-microservice Up 5 hours (healthy)
c78de130d7cc rag-app-multimodal-chatbot-nvingest-paddle-1 Up 5 hours
1d554a3b3ce1 rag-app-multimodal-chatbot-nvingest-deplot-1 Up 5 hours
2bfbb9d78151 nemollm-inference-microservice Up 5 hours (healthy)
b77e00089678 rag-app-multimodal-chatbot-nvingest-redis-1 Up 5 hours
d3d590291ccd rag-app-multimodal-chatbot-nvingest-nv-ingest-ms-runtime-1 Up 5 hours
17f0f0fedc47 milvus-minio Up 5 hours (healthy)
8663f4f9faeb milvus-etcd Up 5 hours (healthy)
```

* Open your browser and interact with the RAG Playground at http://localhost:3001.

### Next Steps

**NOTE** -- the downloadable blueprint deploys the document ingestion pipeline. It does not include a retrieval pipeline.
Refer to the [Notebooks](./notebooks) to evalute the MultiModel RAG with LangChain

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*evaluate

18 changes: 18 additions & 0 deletions notebooks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Notebooks

Once you've deployed Mutlimodel-RAG-with-NV-Ingest-PDF-Extraction follow the below steps to Run the Multimmodel RAG with LongChain

#### Prerequisites

Comeplte the [Get Started](../README.md) steps before proceeding further

### Launch Notebook

Run the below command to launch the example notebook

```
cd multimodal-pdf-data-extraction/notebooks
docker compose up -d
```

Access the notebook from browser with `http://<host-ip>:8888`
38 changes: 38 additions & 0 deletions notebooks/docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
version: '3'
services:
nv-ingest-ms-notebook:
image: nvcr.io/ohlfw0olaadg/ea-participants/nv-ingest:24.08
ports:
- "8888:8888"
cap_add:
- sys_nice
environment:
- CACHED_GRPC_ENDPOINT=cached:8001
- CACHED_HTTP_ENDPOINT=""
- DEPLOT_GRPC_ENDPOINT=""
- DEPLOT_HTTP_ENDPOINT=http://deplot:8000/v1/chat/completions
- DOUGHNUT_GRPC_TRITON=triton-doughnut:8001
- INGEST_LOG_LEVEL=INFO
- MESSAGE_CLIENT_HOST=redis
- MESSAGE_CLIENT_PORT=6379
- MINIO_BUCKET=${MINIO_BUCKET:-nv-ingest}
- NGC_API_KEY=${NGC_API_KEY:-ngcapikey}
- NVIDIA_BUILD_API_KEY=${NVIDIA_BUILD_API_KEY:-nvidiabuildkey}
- OTEL_EXPORTER_OTLP_ENDPOINT=otel-collector:4317
- PADDLE_GRPC_ENDPOINT=paddle:8001
- PADDLE_HTTP_ENDPOINT=""
- REDIS_MORPHEUS_TASK_QUEUE=morpheus_task_queue
- TABLE_DETECTION_GRPC_TRITON=yolox:8001
- TABLE_DETECTION_HTTP_TRITON=""
- YOLOX_GRPC_ENDPOINT=yolox:8001
- YOLOX_HTTP_ENDPOINT=""
- CUDA_VISIBLE_DEVICES=3
volumes:
- ./notebook:/workspace/notebooks
command:
- /bin/bash
- -c
- |
pip install notebook --quiet
jupyter notebook --no-browser --allow-root --NotebookApp.token='' --ip 0.0.0.0 --notebook-dir=/workspace/notebooks

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"cells": [],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}
Loading