Speakeasy GPT

Speakeasy GPT is a Jupyter notebook that utilizes several natural language processing utilities to provide a seamless and low-latency speech interface to ChatGPT and other large language models.

Voice prompts are transcribed using OpenAI's whisper model, run locally on CPU or GPU. The transcription is sent as a prompt to the OpenAI gpt-3.5.turbo API. The response is synthesized into speech by several text to speech engines, including ElevenLabs' API, Mimic 3, and Coqui TTS.

Installation and Dependencies

Mount Drive: Mount the Google Drive to access the notebook files.
Installs: Install the required dependencies and packages, including espeak, ElevenLabs, Mimic 3 TTS, TTS, ffmpeg-python, pydub, and OpenAI.
Imports: Import the necessary libraries and modules for audio processing, natural language processing, and TTS.

Usage

Mount Drive: Mount the Google Drive to access the notebook files.
Installation: Install the required dependencies and packages by running the provided installation commands.
Imports: Import the necessary libraries and modules for audio processing, natural language processing, and TTS.
Check CUDA: Check if CUDA is available for GPU acceleration.
Load Whisper: Load the whisper model for speech-to-text transcription using OpenAI's whisper model.
Load Mimic TTS: Load the Mimic 3 Text-to-Speech System for generating speech using the Mimic 3 TTS engine.
Load Coqui TTS: Load the Coqui TTS engine for generating speech using the Coqui TTS model.
Load ElevenLabs: Load the ElevenLabs API for generating speech using the ElevenLabs TTS model.
Load ChatGPT: Set up the OpenAI API for ChatGPT and define the initial system message.
Record prompt from microphone: Record audio prompts from the microphone and save them as WAV files.
Transcribe prompt audio to text prompt: Transcribe the audio prompts to text using the whisper model.
Prompt ChatGPT: Send the text prompts to the ChatGPT model for generating responses.
Generate response audio: Generate speech audio for the ChatGPT responses using the selected TTS engine.
Full loop - ElevenLabs TTS: Perform the full loop of transcription, ChatGPT prompt, and response audio generation using the ElevenLabs TTS engine.
Full loop - Coqui TTS: Perform the full loop of transcription, ChatGPT prompt, and response audio generation using the Coqui TTS engine.
Full loop - Mimic TTS: Perform the full loop of transcription, ChatGPT prompt, and response audio generation using the Mimic 3 TTS engine.

Note: Modify the code and parameters as needed for your specific use case.

License

This code is licensed under the Creative Commons Attribution-NonCommercial (CC BY-NC) license, allowing for non-commercial use and modification with proper attribution. See the license here: https://creativecommons.org/licenses/by-nc/2.0/

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
speakeasy_gpt_py.ipynb		speakeasy_gpt_py.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speakeasy GPT

Installation and Dependencies

Usage

License

About

Releases

Packages

Languages

astrologos/py-speakeasy

Folders and files

Latest commit

History

Repository files navigation

Speakeasy GPT

Installation and Dependencies

Usage

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages