Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nan logits when performing inference using ModernBERT #35574

Closed
4 tasks
yaswanthg15 opened this issue Jan 9, 2025 · 3 comments
Closed
4 tasks

Nan logits when performing inference using ModernBERT #35574

yaswanthg15 opened this issue Jan 9, 2025 · 3 comments
Labels

Comments

@yaswanthg15
Copy link

yaswanthg15 commented Jan 9, 2025

System Info

transformers == 4.48.0.dev0
torch == 2.2.2

Description

I have finetuned ModelBERT model for multi-label task and when performing batched inference I am getting Nan values for logits in a batch except for one sample. So, in each batch except one sample all the remaining logits are Nan

Who can help?

@tomaarsen @ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Simply passing model(**batch) is resulting in this - while with bs=1 no issue exists.
Code:

model.eval()
labels, predictions, confidences = [], [], []

with torch.no_grad():
    for batch in tqdm(val_loader):
        labels.extend(batch['labels'].cpu().numpy())
        batch.pop('labels')
        batch = {k: v.to(device) for k, v in batch.items()}

        outputs = model(**batch)

        probs = torch.sigmoid(outputs.logits)
        preds = (probs > 0.5).int()
        confs = probs

        predictions.extend(preds.cpu().numpy())
        confidences.extend(confs.cpu().numpy())
image

Expected behavior

Expecting some values instead of Nan

@yaswanthg15 yaswanthg15 added the bug label Jan 9, 2025
@tomaarsen
Copy link
Member

Could you add a snippet to reproduce? I've done lots of inference and I've never seen nans yet.

  • Tom Aarsen

@yaswanthg15
Copy link
Author

@tomaarsen While it is not possible for me to make the model public - but coming to the issue I am performing a simple inference using pytorch loop.

@ArthurZucker
Copy link
Collaborator

In order to help we would need to know:

  • the dtype
  • the attention implementation
    This should help us as a lot of nans can come from, sdpa or casting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants