This application demonstrates a real-time voice interaction system using the RTVI Client library with both the Gemini Multimodal Live and OpenAI RealTime WebRTC integrations. It enables two-way communication between users and the LLM, featuring voice input/output, text messaging, and various audio controls.
- Real-time voice interaction with a Gemini Multimodal Live bot
- Real-time voice interaction with an OpenAI RealTime bot
- Microphone input control and device selection
- Text-based message prompting
- Audio visualization through dynamic speech bubbles
- Comprehensive event handling system
- Connection state management
- Gemini API key (set as environment variable
VITE_DANGEROUS_GEMINI_API_KEY
) - OpenAI API key (set as environment variable
VITE_DANGEROUS_OPENAI_API_KEY
) - Modern web browser with WebSocket support
- Access to microphone
# from base folder
$ yarn
$ yarn workspaces run build
npm i
npm run dev
cp env.example .env
# update .env with API keys
Open http://localhost:5173?service=openai
RTVI Client Documentation Gemini Multimodal Live Documentation OpenAI RealTime WebRTC Documentation
The application automatically initializes when the DOM content is loaded. It sets up:
- Audio device selection
- Microphone controls
- Bot connection management
- Event handlers
- Toggle Bot: Connect/disconnect the AI assistant
- Mute/Unmute: Control microphone input
- Microphone Selection: Choose input device
- Text Input: Send text messages to the bot
The application handles various events including:
- Transport state changes
- Bot connection status
- Audio track management
- Speech detection
- Error handling
- Audio level visualization
let RTVIConfig: RTVIClientOptions = {
transport,
params: {
baseUrl: "api",
requestData: { },
},
enableMic: true,
enableCam: false,
timeout: 30 * 1000,
};
const llm_service_options: GeminiLLMServiceOptions = {
api_key: process.env.VITE_DANGEROUS_GEMINI_API_KEY,
model: "models/gemini-2.0-flash-exp",
// ... additional configuration
};
For all service options and their defaults, see GeminiLLMServiceOptions
const llm_service_options: OpenAIServiceOptions = {
api_key: import.meta.env.VITE_DANGEROUS_OPENAI_API_KEY,
// ... additional configuration
};
For all service options and their defaults, see OpenAIServiceOptions
- Gemini integration currently does not support transcripts
BSD-2 Clause