Replies: 1 comment
-
The paper primarily focused on zero-shot robustness and didn't include fine-tuned performance. We are mildly interested in knowing Whisper's fine-tuned performance, but we don't currently have plans to perform/publish fine-tuning studies, unfortunately. We tried to write decoding.py in an "object-oriented" manner, in hopes to make future extensions like language model integration easier. For example, a language model can be used at the token level by replacing the |
Beta Was this translation helpful? Give feedback.
-
First of all,
Thanks for releasing this with MIT license and making it easy to test out.
Already tried it out with Finnish language.
I have finetuned previously wav2vec2-xlsr for Finnish and made few demos about it. To understand whether in the future I could evaluate this model for my "auto-english-subtitles-demo for Finnish spoken videos" ( Demo available here https://huggingface.co/spaces/Finnish-NLP/Fin-Eng-ASR-autosubtitles )
I would like to know about the following:
Already answered
3. Would it be possible to get word level timestamps from this model? like with Wav2Vec2 in huggingface huggingface/transformers#11307 (Seems that this is already answered here #3 )
Beta Was this translation helpful? Give feedback.
All reactions