large-v3-turbo Core ML: no transcription after segments of noise #2496

resolutecake · 2025-01-12T22:13:55Z

resolutecake
Jan 12, 2025

large-v3-turbo Core ML: no transcription after segments of noise

The issue here is long segments of noise interspersed with speech

There is a 4-hour audio file transcribed as 16 kHz 16-bit mono PCM in one go
The first 30-second conversation at 30 seconds in transcribes great
Subsequently, after 90 minutes of street noise, transcription does not work at all for:
— 20 minute good fidelity conversations
— shorter 20-second conversations down to
— single utterances
there are about 10 segments of conversation in the 4 hours, only the first is transcribed

Issue: after minutes of noise, no transcoding at all is output
— instead there are hallucinations repeated “Okay.” “ I'm sorry.” “One, two, three, four, five.”
Although large-v3-turbo transcribes great when working and
Core ML on neural engine is extremely fast
This never worked for whisper and there are plenty of complaints by others from segments of silence or noise, ie. non-speech

What is the action to get everything transcribed?

I tried speech threshold option value which does not help
other ideas are: preprocessing, cutting audio, whisper options

ryanheise · 2025-01-13T01:17:58Z

ryanheise
Jan 13, 2025

You might try the option --condition_on_previous_text False if you're dealing with isolated 30 second conversations or utterances. There are other options that might help, but see how that goes first.

0 replies

resolutecake · 2025-01-14T16:03:37Z

resolutecake
Jan 14, 2025
Author

Whisper is used via whisper.cpp
it was discovered that the missing transcriptions are caused by some issue with whisper-cli 1.7.4 250113

By using whisper.cpp Go bindings interacting directly with the upstream whisper model,
Apple M Core ML large-v3-turbo model transcribes excellently at
40× for 2 instances per cpu for all cpus, 22–32× for one instance depending on cpu type, ie. Mac mini or MacBook Pro.
More than two instances: throughput does not increase but per-file time obviously slows. The 8 GiB RAM mini hangs

Because of the 16-core neural engine is used, cpu load is negligible.
Activity Monitor shows 40% GPU load per instance

large-v3-turbo is 100× better than any model before it, just like that M processor

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

large-v3-turbo Core ML: no transcription after segments of noise #2496

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

large-v3-turbo Core ML: no transcription after segments of noise #2496

resolutecake Jan 12, 2025

What is the action to get everything transcribed?

Replies: 2 comments

ryanheise Jan 13, 2025

resolutecake Jan 14, 2025 Author

resolutecake
Jan 12, 2025

ryanheise
Jan 13, 2025

resolutecake
Jan 14, 2025
Author