This NVIDIA AI blueprint shows developers how to build an application that transforms PDFs into engaging audio content. Built on NVIDIA NIM, this blueprint is flexible, and can run securely on a private network, delivering actionable insight without sharing sensitive data.
The blueprint accepts a Target PDF and optionally multiple Context PDFs. The Target PDF will be the main source of information for the generated transcript while Context PDFs will be used as additional reference for the agent to use. The user can also optionally specify a guide prompt that will give a focus for the agent generated transcript (i.e. “Focus on the key drivers for NVIDIA’s Q3 earnings report”).
For more information about the PDF, Agent and TTS service flows, please refer to the mermaid diagram
- NVIDIA NIM microservices
- Response generation (Inference)
- Document ingest and extraction - Docling
- Text-to-speech - ElevenLabs
- Redis - Redis
- Storage - MinIO
Note: Since NVIDIA blueprints are adaptable to your specific business use case and/or infrastructure, the above software components are configurable. For example, to decrease the amount of GPU memory required, you can leverage a smaller Llama 3.1-8B NIM and disable GPU usage for Docling in docker-compose.yaml.
Docker Compose scripts are provided which spin up the microservices on a single node. The Blueprint contains sample use-case PDFs but Developers can build upon this blueprint, by using their own PDFs based upon their specific use case.
Below are the hardware requirements, these are dependent on how you choose to deploy the blueprint. There are 2 ways to deploy this blueprint:
-
Default - Use NVIDIA API catalog NIM endpoints
- Can run on any non-gpu accelerated machine/VM
- 8 CPU cores
- 64 GB RAM
- 100GB disk space
- A public IP address is also required
- Can run on any non-gpu accelerated machine/VM
-
Locally host NVIDIA NIM
Note: To run the blueprint at scale and for faster preprocessing of PDFs, it is recommended to use GPU for running the PDF ingest/extraction pipeline.
- NVIDIA AI Enterprise developer licence required to local host NVIDIA NIM.
- API catalog keys:
- NVIDIA API catalog or NGC
- ElevenLabs
- Docker Compose
System requirements: Ubuntu 20.04 or 22.04 based machine, with sudo privileges
Install software requirements:
- Install Docker Engine and Docker Compose. Refer to the instructions for Ubuntu.
- Ensure the Docker Compose plugin version is 2.29.1 or higher.
- Run docker compose version to confirm.
- Refer to Install the Compose plugin in the Docker documentation for more information.
- To configure Docker for GPU-accelerated containers, install the NVIDIA Container Toolkit.
- Install git
- Obtain API keys:
NVIDIA Inference Microservices (NIM)
- There are two possible methods to generate an API key for NIM:
- Sign in to the NVIDIA Build portal with your email.
- Click on any model, then click "Get API Key", and finally click "Generate Key".
- Sign in to the NVIDIA NGC portal with your email.
- Select your organization from the dropdown menu after logging in. You must select an organization which has NVIDIA AI Enterprise (NVAIE) enabled.
- Click on your account in the top right, select "Setup" from the dropdown.
- Click the "Generate Personal Key" option and then the "+ Generate Personal Key" button to create your API key.
- This will be used in the NVIDIA_API_KEY environment variable.
- Click the "Generate API Key" option and then the "+ Generate API Key" button to create the API key.
- Sign in to the NVIDIA Build portal with your email.
IMPORTANT: This will be used in the NVIDIA_API_KEY environment variable below.
-
Clone the repo
git clone https://github.com/NVIDIA-AI-Blueprints/pdf-to-podcast
-
Set environment variables
#Create env file with required variables in /home/<username>/.local/bin/env echo "ELEVENLABS_API_KEY=your_key" >> .env echo "NVIDIA_API_KEY=your_key" >> .env echo "MAX_CONCURRENT_REQUESTS=1" >> .env
Note: the ElevenLabs API key can handle concurrent requests. For local development, set MAX_CONCURRENT_REQUESTS=1 to avoid rate-limiting issues.
- Install dependencies
We use UV to manage Python dependencies.
make uv
This will:
- Install UV if not present
- Create virtual environment
- Install project dependencies
If you open up a new terminal window and want to quickly re-use the same environment, you can run make uv again.
-
** Start the development server**
make all-services
Note: The first time you run make all-services, the docling service may take 10-15 minutes to pull and build. Subsequent runs will be much faster.
This command will:
- Verify environment variables are set
- Create necessary directories
- Start all services using Docker Compose in --build mode.
Note: Set DETACH=1 to run the services in detached mode to continue using your terminal while the services are running.
- View Swagger API documentation
To view the Swagger UI for the API, you can view them locally at localhost:8002/docs
. If running this on a VM, you will need to port forward 8002 locally or expose the port on your VM.
-
Generate the Podcast
source .venv/bin/activate python tests/test.py --target <pdf1.pdf> --context <pdf2.pdf>
By default, this command will generate a 2-person podcast. To generate a 1-person podcast, add the --monologue flag.
IMPORTANT: By default test.py
expects pdfs to be in the samples
directory.
For debugging an error that occurs in the test, view the docker compose logs.
- Host the PDF service on a separate machine.
This blueprint uses docling as the default PDF extraction service.
To run the PDF extraction service on a separate machine, add the following to your .env file:
echo "MODEL_API_URL=<pdf-model-service-url" >> .env
The make model-dev
target will let you spin up only the docling service.
- Use Self-hosted NIM
By default this blueprint uses an ensemble of 3 LLMs to generate podcasts. The example uses the Llama 3.1-8B, Llama 3.1-70B, & Llama 3.1-405B NIMs for balanced performance and accuracy. To use a different model, update the models.json file with the desired model. The default models.json calls an NVIDIA-hosted API Catalog endpoints. This is the default configuration and is recommended for most users getting started with the blueprint but once you want to adapt the blueprint, locally hosted NIM endpoints are required.
- Change the Default Models and GPU Assignments
It is easy to swap out different pieces of the stack to optimize GPU usage for available hardware. For example, minimize GPU usage by swapping in the smaller Llama 3.1-8B NIM and disabling GPU usage for docling in docker-compose.yaml.
- Enable Tracing
We expose a Jaeger instance at http://localhost:16686/ for tracing. This is useful for debugging and monitoring the system.
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests: python tests/test.py
- Run linting: make ruff
- Submit a pull request
The project uses ruff for linting and formatting. You must run make ruff before your PR can be merged:
make ruff # Runs both lint and format
We use GitHub Actions for CI/CD. We run the following actions:
ruff
: Runs linting and formattingpr-test
: Runs an end-to-end podcast test on the PRbuild-and-push
: Builds and pushes a new container image to the remote repo. This is used to update production deployments
Important : This setup uses HTTP and is not intended for production deployments. For production deployments, consider implementing the following security measures:
- Add SSL/TLS encryption by either:
- Configuring uvicorn with SSL certificates
- Setting up a reverse proxy (like Nginx) to handle SSL termination
- Implement proper certificate management
- Configure appropriate security headers
- Follow other web security best practices