-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ElevenLabsTTSService and PlayHTTTSService interruptions don't occur with long LLM completions #950
Comments
Sometimes I get a sort of "inverse" behavior where it will stop generating new LLM frames, but instead will just continue to play the TTS that's already been generated |
I'm observing similar behaviors. What OS are you running this on? I have no issue on Mac, then weirdness on Windows. Sometimes when this happens, I see the transcript come through for the interruption because it is printed in the terminal like yours does in the output above, so I know the STT service picked it up, but then that frame doesn't appear to register with the rest of the pipeline. Finally, when the thing does register what I said, it only includes the last statement, and all those STT frames that were transcribed by the service aren't included in the message history that is sent to the llm step. So those frames get lost somewhere in the process. |
I'm on macOS. I also observe that it's inconsistent. I can change the behavior by slimming it down, but ultimately I can still trigger this even with slimmed versions (removing custom processors). |
In some more testing, it seems that maybe it is the case that the rule is I am not allowed to interrupt until it has finished speaking its first sentence, then the interruptions work as expected. |
No I can trigger this outside that sentence, and if it’s only generating one sentence I can interrupt and don’t get this behavior |
:/ I keep hoping to uncover some general rule running beneath the thing...something to explain the apparent inconsistent performance. I'm still exploring fixes. |
Hi all, @aconchillo and I have confirmed that this issue affects In the meantime, interruptions work well with CartesiaTTSService, which uses a WebSocket-based connection with excellent TTFB (consistently ~170ms). Additionally, all HTTP-based TTS services support interruptions. You can check out a list of available services here: https://docs.pipecat.ai/server/services/supported-services#text-to-speech. Solving this problem is a high priority for us, and we will provide updates in this issue as they become available. Details:
|
Description
When an LLM is still generating sentences when an interruption occurs (e.g. "tell me a long story"), any currently queued TTS messages are aborted, however the LLM continues to generate sentences, causing the bot to start speaking again.
This makes the experience SUPER painful to users, causing them to want to scream at the bot until it shuts up.
Environment
Repro steps
Ask the bot to say something long, interrupt it while it's still generating messages.
Expected behavior
LLM stops generating when interrupted.
Actual behavior
LLM keeps generating, pushing more TTS frames and causes weird skipping.
Logs
The following logs mix our custom logging and processors, but they accompany the recording so you can see what I mean
When you see the various
Got frame StartInterruptionFrame
is when we're sending proper interruptions. As you can see in the attached video and the logs below, we send audio frames betweenGenerating TTS
logs, which means the LLM is still generating frames. Because the LLM is not interrupted, you can hear that previously generated frames are skipped, but the new ones generated play the TTS despite being interrupted.tangia_dantest.2025-01-08.20_19_02.mp4
The text was updated successfully, but these errors were encountered: