-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text streaming support #5
Comments
well, i fiddled with mine to do that.. and it needs a redesign. can't wait for the response from a chunk, so have to spin off a async task to handle the waits, and if there is text, send it some place to consolidate with prior and maybe signal done |
so I modified my asr to do interim results, on the fly.. but as suspected it will taks some work to figure out what to do with the audio data.. currenlty for test, if the transcriber returns text (not '') then I send that back and drop the audio input saved to here. should be testing testing testing testing but got test test Washigton testing test with a lot of no text responses in between
I don't know what my transcriber does under the covers.. |
I think this could be done with the appropriate start/stop/chunk events. So for ASR/STT response, it could be |
transcript is at the end I think another Parm on the Transcribe would indicate that the client is enabled for interim results. transcriptchunk implies the client is processing the chunks somehow it's unlikely that every client would change. the current whisper sends the results on audio stop, not transcript anyhow. |
but that doesn't tell the client if the server will send interim results. currently Transcribe doesn't have a response |
Maybe we could use the Describe Info response to indicate if the asr supports intermediate responses then I suppsose a new TranscriptChunk event out from the asr would inform the client |
I did it with a new parm on Transcript... doesn't help on the start to know if the event receiver can handle that or if its wasted energy I'll add that to #33 for test one has to be sure the receiver can recover from not receiving the 'end' event (AudioStop in asr) as it will have sent the Transcript/final unsolicited but streaming text to TTS will take a couple changes . the synthesize will need some id/timestamp to synch w others , and if not another event , then a continued:bool/false to indicated with its id that it is more text so repeat Synthesize, id=same, continued=true |
It would be good to support chunked text the same way we support chucked audio. The reason is that LLMs produce the text token-by-token, and when the text is big, we would like to start producing the audio via tts right away instead of waiting.
The text was updated successfully, but these errors were encountered: