Skip to content

Make whisper transcribe numbers in the actual spoken words #1041

Answered by jongwook
Thresher12 asked this question in Q&A

You must be logged in to vote

It's not an explicit conversion but the model predicting the most likely textual output end-to-end. You can try the following which blocks all numeric tokens and encourages the model to transcribe in them literally.

from whisper.tokenizer import get_tokenizer

tokenizer = get_tokenizer(multilingual=False)  # use multilingual=True if using multilingual model
number_tokens = [
    i 
    for i in range(tokenizer.eot)
    if all(c in "0123456789" for c in tokenizer.decode([i]).removeprefix(" "))
]

...

model.transcribe("audio.mp3", suppress_tokens=[-1] + number_tokens, ...)

Replies: 3 comments 15 replies

You must be logged in to vote
10 replies
@jongwook

@orianemartin

@ulatekh

@kdcyberdude

@grzegorz700

Answer selected by jongwook

You must be logged in to vote
2 replies
@lixikun

@jongwook

You must be logged in to vote
3 replies
@asr-lord

@Warp-MFT

@JabblyApp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
10 participants