The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. #2335

djj067 · 2024-09-11T15:13:42Z

djj067
Sep 11, 2024

I Met the issue in title when im trying to use whisper large to do the recognition.

Here is my setup:

model_id = "openai/whisper-large-v3"
whisper_model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=whisper_model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
chunk_length_s=30,
batch_size=8,
torch_dtype=torch_dtype,
device=device,
)
result = pipe(audio_data, generate_kwargs={"language": "chinese"}, return_timestamps=True)

and i do see some problems, for example, about 20% for the output files doesnt have the timestamp,

also sometimes, the model recognize background music(no lyrics, just melody) as the speech. and sometimes it has some werid generetion.

May i know if those problems are due to some wrong setting(like the attention mask one) or just sometimes the model cant do it correctly,

AlessandroSpallina · 2025-01-06T02:12:23Z

AlessandroSpallina
Jan 6, 2025

up

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. #2335

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. #2335

djj067 Sep 11, 2024

Replies: 1 comment

AlessandroSpallina Jan 6, 2025

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. #2335

djj067
Sep 11, 2024

AlessandroSpallina
Jan 6, 2025