Skip to content

Found an adversarial sample/track #23

Answered by jongwook
knaik asked this question in Q&A
Discussion options

You must be logged in to vote

I believe by "tokenization" you meant phrase boundaries produced by the model. The term more often means converting text to/from a sequence of integers, as done in tokenizer.py.

Why do the transcription/translations seem to depend on the input bitrate, why is it affecting tokenization? Is there an optimal type of source?

There might be a correlation between audio quality and typical phrase lengths in the subtitles.

Is there a way to fine tune the window frame to more tightly or loosely match the predicted tokens?

Changing the decoding rules for the timestamp tokens like in #139 (comment) can affect how long the segments are in general, but not precisely.

If I understand correctly, is…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by jongwook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants