Inaccurate Splitting with whisper #14

Dannypeja · 2023-09-28T06:28:44Z

Hey I wanted to ask if anybody else is facing this issue.
I am using parts of this repo to split up a long text for utterances and speakers with diarization.

However the split text is not word accurate. Most of the split parts are cut in the end, which is quite bad for TTS datasets.
Increasing padding didn't solve this cut words issue, it just delayed them to a later position of the sentence.

Any ideas?
Thanks a lot!

JarodMica · 2023-09-28T06:45:17Z

I believe there are some accuracy limitations if there are multiple speakers in the audio as this can throw off time stamps. You can try to clean out as much background noise as possible with UVR to help out whisper, but I've noticed the accuracy of it varies from dataset to dataset. You might be able to try out my other repo that uses a silence threshold to split a dataset to see if that works better for your needs: https://github.com/JarodMica/audiosplitter

Note, this does not have speaker diarization though.

Dannypeja · 2023-09-28T06:53:18Z

Yeah the Diarization is key for my purpose unfortunately.
Thanks for the quick response!
Usining silence threshold I found audacity to be the gold standard.

I propose the following workaround for my issue:
Use whisper to detect whenever speaker x starts talking.
Export the Marks to an editing software. Manually cut out longer talk sections.
Not use audacity or your audiosplitter to get good datasets.
Then use transcription again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inaccurate Splitting with whisper #14

Inaccurate Splitting with whisper #14

Dannypeja commented Sep 28, 2023

JarodMica commented Sep 28, 2023 •

edited

Loading

Dannypeja commented Sep 28, 2023

Inaccurate Splitting with whisper #14

Inaccurate Splitting with whisper #14

Comments

Dannypeja commented Sep 28, 2023

JarodMica commented Sep 28, 2023 • edited Loading

Dannypeja commented Sep 28, 2023

JarodMica commented Sep 28, 2023 •

edited

Loading