Questions #7

R4ZZ3 · 2022-09-21T19:01:14Z

R4ZZ3
Sep 21, 2022

First of all,
Thanks for releasing this with MIT license and making it easy to test out.
Already tried it out with Finnish language.

I have finetuned previously wav2vec2-xlsr for Finnish and made few demos about it. To understand whether in the future I could evaluate this model for my "auto-english-subtitles-demo for Finnish spoken videos" ( Demo available here https://huggingface.co/spaces/Finnish-NLP/Fin-Eng-ASR-autosubtitles )
I would like to know about the following:

Any plans on adding examples for finetuning?
Do you thing it would be possible to add KenLM or some other language model like in Wav2Vec2ProcessorWithLM from huggingface https://huggingface.co/docs/transformers/model_doc/wav2vec2#transformers.Wav2Vec2ProcessorWithLM

Already answered
3. Would it be possible to get word level timestamps from this model? like with Wav2Vec2 in huggingface huggingface/transformers#11307 (Seems that this is already answered here #3 )

jongwook · 2022-09-21T20:57:02Z

jongwook
Sep 21, 2022
Maintainer

The paper primarily focused on zero-shot robustness and didn't include fine-tuned performance. We are mildly interested in knowing Whisper's fine-tuned performance, but we don't currently have plans to perform/publish fine-tuning studies, unfortunately.

We tried to write decoding.py in an "object-oriented" manner, in hopes to make future extensions like language model integration easier. For example, a language model can be used at the token level by replacing the TokenDecoder, and sequence level by replacing SequenceRanker. Alternatively, in case Whisper gets integrated to HuggingFace Transformers, it could be easier to combine with their language model support.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions #7

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Questions #7

R4ZZ3 Sep 21, 2022

Replies: 1 comment

jongwook Sep 21, 2022 Maintainer

R4ZZ3
Sep 21, 2022

jongwook
Sep 21, 2022
Maintainer