-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Roberta embedding models #9387
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
b02f53c
to
2138ac3
Compare
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Model implementation looks good (though a bit hacky). Can you add tests for the model? |
Also, remember to update the Supported Models page. |
It would also be nice to consider adding the |
great! What's the ETA on this??? |
@DarkLight1337 , this would be great yes. I was thinking that we could use your chat embedding API to format sentence pair separated by a separator token as input to sentence classification models. The only problem would be the token type tensor that also has to be passed as input. But maybe this would be outside of the scope of this issue. Maybe we can add this in another PR just to keep the scope of each PR small. |
Sure, I'll add the tests. I don't disagree that this is a bit hacky. Should we make the Bert classes more generic so that we can pass the embedding layer class as a parameter? |
That would be great. Another way would be to have an abstract |
Signed-off-by: Max de Bayser <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Flavia Beo <[email protected]>
Signed-off-by: Flavia Beo <[email protected]>
Signed-off-by: Flavia Beo <[email protected]>
Signed-off-by: Flavia Beo <[email protected]>
Signed-off-by: Flavia Beo <[email protected]>
Signed-off-by: Flavia Beo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding this!
Signed-off-by: Flavia Beo <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Head branch was pushed to by a user without write access
Signed-off-by: Flavia Beo <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
@DarkLight1337, I realized that with Roberta models the position_ids start at padding_idx + 1 (see here and here ) I've added a line of code to increment all position_ids by that amount. Without this, the results I get in the STS12 task from the MTEB benchmark for In my tests, all the position_ids in vllm for the embedding use case start with 0 and end with len()-1 and there are no padding tokens because the input tensors are 1-dimensional without padding. For example:
Is there a scenario in which there could be a presence of padding tokens? (Except for the case in which the user inserts in the input text). |
Hmm, we may need to add a correctness test that compares against HF then.
Don't think so, since vLLM encodes each prompt separately. Just to be sure, you can add an assertion statement so we know if our assumption is false. |
Are you still considering adding this to the pr? If not i could try to make an attempt. |
Signed-off-by: Max de Bayser <[email protected]>
@DarkLight1337 , I've added an
Yes, but it would have to be |
The existing tests for text-only embedding models already use sentence-transformers, so it should be pretty straightforward to add this model to the list. |
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
The test is failing with Unsupported('dynamic shape operator: aten.nonzero.default; to enable, set torch._dynamo.config.capture_dynamic_output_shape_ops = True\n\nfrom user code:\n File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/roberta.py", line 107, in forward\n assert len(torch.nonzero(positions[start_pos])) == 0\n\nSet TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information\n\n\nYou can suppress this exception and fall back to eager by setting:\n import torch._dynamo\n torch._dynamo.config.suppress_errors = True\n') Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
The discussion on |
Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Flavia Beo <[email protected]> Co-authored-by: Flavia Beo <[email protected]>
Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Flavia Beo <[email protected]> Co-authored-by: Flavia Beo <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>
Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Flavia Beo <[email protected]> Co-authored-by: Flavia Beo <[email protected]> Signed-off-by: rickyx <[email protected]>
Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Flavia Beo <[email protected]> Co-authored-by: Flavia Beo <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Flavia Beo <[email protected]> Co-authored-by: Flavia Beo <[email protected]>
This PR adds support for Roberta embedding models. It's mostly the same as the Bert architecture, the only thing that changes is the padding token in the Embedding layer so this PR tries to reuse Bert modeling classes as much as possible. For some of the models we also need head size 32, so this size is added to the kernels here.
cc: @robertgshaw2-neuralmagic , @DarkLight1337
FIX #9847