GitHub - NVIDIA-AI-Blueprints/multimodal-pdf-data-extraction: NVIDIA AI Blueprint for multimodal PDF data extraction for enterprise RAG

NVIDIA AI Blueprint: Multimodal PDF Data Extraction

Rapidly ingest massive volumes of PDF documents. Extract text, graphs, charts, and tables for highly accurate retrieval.

Introduction

This blueprint is based on NVIDIA-Ingest -- a scalable, performance-oriented document content and metadata extraction microservice. It includes support for parsing PDFs, Word and PowerPoint documents, using specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images for use in downstream generative applications.

NVIDIA Ingest enables parallel document splitting to rapidly extract data from many documents at the same time.

Get Started

Apply for Early Access.
Follow the getting started documentation here.

NOTE -- the downloadable blueprint deploys the document ingestion pipeline. It does not include a retrieval pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA AI Blueprint: Multimodal PDF Data Extraction

Introduction

Get Started

About

Releases

Packages

Contributors 3

NVIDIA-AI-Blueprints/multimodal-pdf-data-extraction

Folders and files

Latest commit

History

Repository files navigation

NVIDIA AI Blueprint: Multimodal PDF Data Extraction

Introduction

Get Started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages