Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inaccurate Splitting with whisper #14

Open
Dannypeja opened this issue Sep 28, 2023 · 2 comments
Open

Inaccurate Splitting with whisper #14

Dannypeja opened this issue Sep 28, 2023 · 2 comments

Comments

@Dannypeja
Copy link

Hey I wanted to ask if anybody else is facing this issue.
I am using parts of this repo to split up a long text for utterances and speakers with diarization.

However the split text is not word accurate. Most of the split parts are cut in the end, which is quite bad for TTS datasets.
Increasing padding didn't solve this cut words issue, it just delayed them to a later position of the sentence.

Any ideas?
Thanks a lot!

@JarodMica
Copy link
Owner

JarodMica commented Sep 28, 2023

I believe there are some accuracy limitations if there are multiple speakers in the audio as this can throw off time stamps. You can try to clean out as much background noise as possible with UVR to help out whisper, but I've noticed the accuracy of it varies from dataset to dataset. You might be able to try out my other repo that uses a silence threshold to split a dataset to see if that works better for your needs: https://github.com/JarodMica/audiosplitter

Note, this does not have speaker diarization though.

@Dannypeja
Copy link
Author

Yeah the Diarization is key for my purpose unfortunately.
Thanks for the quick response!
Usining silence threshold I found audacity to be the gold standard.

I propose the following workaround for my issue:
Use whisper to detect whenever speaker x starts talking.
Export the Marks to an editing software. Manually cut out longer talk sections.
Not use audacity or your audiosplitter to get good datasets.
Then use transcription again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants