Poor quality transcripts with V3 for Dutch #1843

ME-researchgroup · 2023-11-27T08:08:58Z

ME-researchgroup
Nov 27, 2023

Hi!

I have currently switched back to V2 as the output for V3 is very poor for my use case. I mainly use whisper to transcribe interviews. V2's performance was pretty good in most cases for Teams calls recordings, but the output for V3 is pretty much useless..

V3 seems to get stuck in loops a lot more often:

[20:26.760 --> 20:28.760]  Ja.
[20:28.760 --> 20:30.760]  Ja.
[20:30.760 --> 20:32.760]  Ja.
[20:32.760 --> 20:34.760]  Ja.
[20:34.760 --> 20:36.760]  Ja.
[20:36.760 --> 20:38.760]  Ja.
[20:38.760 --> 20:40.760]  Ja.
[20:40.760 --> 20:42.760]  Ja.
[20:42.760 --> 20:43.760]  Ja.
[20:43.760 --> 20:44.760]  Ja.
[20:44.760 --> 20:45.260]  Ja.
[20:45.260 --> 20:46.260]  Ja.
[20:46.260 --> 20:47.260]  Ja.
[20:47.260 --> 20:48.260]  Ja.
[20:48.260 --> 20:49.260]  Ja.
[20:49.260 --> 20:50.260]  Ja.
[20:50.260 --> 20:51.260]  Ja.
[20:51.260 --> 20:52.260]  Ja.
[20:52.260 --> 20:53.260]  Ja.
[20:53.260 --> 20:54.260]  Ja.
[20:54.260 --> 20:55.260]  Ja.
[20:55.260 --> 20:56.260]  Ja.
[20:56.260 --> 20:57.260]  Ja.
[20:57.260 --> 20:58.260]  Ja.
[20:58.260 --> 20:59.260]  Ja.
[20:59.260 --> 21:00.260]  Ja.
[21:00.260 --> 21:01.260]  Ja.
[21:01.260 --> 21:02.260]  Ja.
[21:02.260 --> 21:03.260]  Ja.
[21:03.260 --> 21:04.260]  Ja.
[21:04.260 --> 21:05.260]  Ja.
[21:05.260 --> 21:06.260]  Ja.
[21:06.260 --> 21:07.260]  Ja.
[21:07.260 --> 21:08.260]  Ja.
[21:08.260 --> 21:09.260]  Ja.
[21:09.260 --> 21:12.260]  Dus beide partijen hebben eigenlijk verantwoordelijkheden daarin, zeg je.
[21:12.260 --> 21:13.260]  Ja.
[21:13.260 --> 21:14.260]  Ja.
[21:16.260 --> 21:17.260]  Ja.
[21:17.260 --> 21:18.260]  Ja.

It also introduces new hallucinations in multiple languages, even though the language has been specified to be Dutch. I have never encountered this issue with V2:

[21:18.260 --> 21:20.260]  En wat is er van die verhandeling freaking leeg dat is een kunstst authorized?
[21:20.260 --> 21:23.260]  DitAB, dit Aboriginal & Which isques project, wat is er in de Fare during upgrade & Currenc calling at bellevue van dat de AIU aan hun uitgezaktte gedraging krijgt?
[21:23.260 --> 21:26.260]  Nou ja, zo je z无nt eigenlijk seksueel de klink University, nou kunnen ze mariko dormir en hä, het gaat allemaal best wel om kan hebben ze gezien.
[21:26.260 --> 21:38.340]  TikTok, het is eigenlijk super sh options, tapasje, Robinson 2, negen воз persuadeurens, awe, mariko voديen, deuity, behunten, ze zijn allemaal verpleeg dreameder s, echt kritiek is dat zo het Harry� choreography het opslagen delen van hun sales

This has been happening with every Teams recording I feed to whisper.
The benchmarks suggest that performance for Dutch should be a lot better with V3.

I am currently not tweaking any hyperparameters and run whisper from python like so:
whisper.transcribe(model, audio, verbose=True, fp16=False, language="dutch")

Anyone else experiencing these issues?

TamarJanssens · 2024-12-09T15:55:12Z

TamarJanssens
Dec 9, 2024

got any updates regarding improvements on Dutch?

1 reply

realtechspecs Dec 9, 2024

got any updates regarding improvements on Dutch?

If you encountered a similar issue, pls upload a test audio/video file

realtechspecs · 2024-12-09T23:45:21Z

realtechspecs
Dec 9, 2024

Hi!

I have currently switched back to V2 as the output for V3 is very poor for my use case. I mainly use whisper to transcribe interviews. V2's performance was pretty good in most cases for Teams calls recordings, but the output for V3 is pretty much useless..

V3 seems to get stuck in loops a lot more often:

[20:26.760 --> 20:28.760]  Ja.
[20:28.760 --> 20:30.760]  Ja.
[20:30.760 --> 20:32.760]  Ja.
[20:32.760 --> 20:34.760]  Ja.
[20:34.760 --> 20:36.760]  Ja.
[20:36.760 --> 20:38.760]  Ja.
[20:38.760 --> 20:40.760]  Ja.
[20:40.760 --> 20:42.760]  Ja.
[20:42.760 --> 20:43.760]  Ja.
[20:43.760 --> 20:44.760]  Ja.
[20:44.760 --> 20:45.260]  Ja.
[20:45.260 --> 20:46.260]  Ja.
[20:46.260 --> 20:47.260]  Ja.
[20:47.260 --> 20:48.260]  Ja.
[20:48.260 --> 20:49.260]  Ja.
[20:49.260 --> 20:50.260]  Ja.
[20:50.260 --> 20:51.260]  Ja.
[20:51.260 --> 20:52.260]  Ja.
[20:52.260 --> 20:53.260]  Ja.
[20:53.260 --> 20:54.260]  Ja.
[20:54.260 --> 20:55.260]  Ja.
[20:55.260 --> 20:56.260]  Ja.
[20:56.260 --> 20:57.260]  Ja.
[20:57.260 --> 20:58.260]  Ja.
[20:58.260 --> 20:59.260]  Ja.
[20:59.260 --> 21:00.260]  Ja.
[21:00.260 --> 21:01.260]  Ja.
[21:01.260 --> 21:02.260]  Ja.
[21:02.260 --> 21:03.260]  Ja.
[21:03.260 --> 21:04.260]  Ja.
[21:04.260 --> 21:05.260]  Ja.
[21:05.260 --> 21:06.260]  Ja.
[21:06.260 --> 21:07.260]  Ja.
[21:07.260 --> 21:08.260]  Ja.
[21:08.260 --> 21:09.260]  Ja.
[21:09.260 --> 21:12.260]  Dus beide partijen hebben eigenlijk verantwoordelijkheden daarin, zeg je.
[21:12.260 --> 21:13.260]  Ja.
[21:13.260 --> 21:14.260]  Ja.
[21:16.260 --> 21:17.260]  Ja.
[21:17.260 --> 21:18.260]  Ja.

It also introduces new hallucinations in multiple languages, even though the language has been specified to be Dutch. I have never encountered this issue with V2:

[21:18.260 --> 21:20.260]  En wat is er van die verhandeling freaking leeg dat is een kunstst authorized?
[21:20.260 --> 21:23.260]  DitAB, dit Aboriginal & Which isques project, wat is er in de Fare during upgrade & Currenc calling at bellevue van dat de AIU aan hun uitgezaktte gedraging krijgt?
[21:23.260 --> 21:26.260]  Nou ja, zo je z无nt eigenlijk seksueel de klink University, nou kunnen ze mariko dormir en hä, het gaat allemaal best wel om kan hebben ze gezien.
[21:26.260 --> 21:38.340]  TikTok, het is eigenlijk super sh options, tapasje, Robinson 2, negen воз persuadeurens, awe, mariko voديen, deuity, behunten, ze zijn allemaal verpleeg dreameder s, echt kritiek is dat zo het Harry� choreography het opslagen delen van hun sales

This has been happening with every Teams recording I feed to whisper. The benchmarks suggest that performance for Dutch should be a lot better with V3.

I am currently not tweaking any hyperparameters and run whisper from python like so: whisper.transcribe(model, audio, verbose=True, fp16=False, language="dutch")

Anyone else experiencing these issues?

Can you share a sample audio/video file we can test with?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor quality transcripts with V3 for Dutch #1843

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Poor quality transcripts with V3 for Dutch #1843

ME-researchgroup Nov 27, 2023

Replies: 2 comments · 1 reply

TamarJanssens Dec 9, 2024

realtechspecs Dec 9, 2024

realtechspecs Dec 9, 2024

ME-researchgroup
Nov 27, 2023

Replies: 2 comments 1 reply

TamarJanssens
Dec 9, 2024

realtechspecs
Dec 9, 2024