large-v3-turbo Core ML: no transcription after segments of noise #2496
Replies: 2 comments
-
You might try the option |
Beta Was this translation helpful? Give feedback.
-
Whisper is used via whisper.cpp By using whisper.cpp Go bindings interacting directly with the upstream whisper model, Because of the 16-core neural engine is used, cpu load is negligible. large-v3-turbo is 100× better than any model before it, just like that M processor |
Beta Was this translation helpful? Give feedback.
-
large-v3-turbo Core ML: no transcription after segments of noise
The issue here is long segments of noise interspersed with speech
There is a 4-hour audio file transcribed as 16 kHz 16-bit mono PCM in one go
The first 30-second conversation at 30 seconds in transcribes great
Subsequently, after 90 minutes of street noise, transcription does not work at all for:
— 20 minute good fidelity conversations
— shorter 20-second conversations down to
— single utterances
there are about 10 segments of conversation in the 4 hours, only the first is transcribed
Issue: after minutes of noise, no transcoding at all is output
— instead there are hallucinations repeated “Okay.” “ I'm sorry.” “One, two, three, four, five.”
Although large-v3-turbo transcribes great when working and
Core ML on neural engine is extremely fast
This never worked for whisper and there are plenty of complaints by others from segments of silence or noise, ie. non-speech
What is the action to get everything transcribed?
I tried speech threshold option value which does not help
other ideas are: preprocessing, cutting audio, whisper options
Beta Was this translation helpful? Give feedback.
All reactions