-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Add support for Qwen2-VL video embeddings input & multiple image embeddings input with varied resolutions #10221
[Model] Add support for Qwen2-VL video embeddings input & multiple image embeddings input with varied resolutions #10221
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
…age embeddings input with varied resolutions Signed-off-by: imkero <[email protected]>
9022a63
to
79ca0e8
Compare
Thanks for adding this! Are you able to add embedding input tests for Qwen2-VL? |
Sure! I will update this PR with related tests soon. |
Signed-off-by: imkero <[email protected]>
Signed-off-by: imkero <[email protected]>
39d61d2
to
84ec384
Compare
Would be great if you can incorporate this to |
I have noticed
So I think it should be ok to add a separate test file for this feature, maybe? |
That's a good point. Let's keep the tests in a separate file then. Can you mark these tests with |
Signed-off-by: imkero <[email protected]>
Signed-off-by: imkero <[email protected]>
8b8db2e
to
ee2344b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests have passed, so LGTM. Thanks for implementing this!
…age embeddings input with varied resolutions (vllm-project#10221) Signed-off-by: imkero <[email protected]>
…age embeddings input with varied resolutions (vllm-project#10221) Signed-off-by: imkero <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>
…age embeddings input with varied resolutions (vllm-project#10221) Signed-off-by: imkero <[email protected]>
…age embeddings input with varied resolutions (vllm-project#10221) Signed-off-by: imkero <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>
…age embeddings input with varied resolutions (vllm-project#10221) Signed-off-by: imkero <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>
…age embeddings input with varied resolutions (vllm-project#10221) Signed-off-by: imkero <[email protected]>
Goal
Add support for Qwen2-VL multiple image embeddings input with varied resolutions
currently, vLLM implementation of Qwen2-VL's image embedding input requires all images in the input have the same resolution, however Qwen2-VL supports varied image resolution (with vision token num varied as well). Fix it in this PR
current vLLM impl:
vllm/vllm/model_executor/models/qwen2_vl.py
Lines 893 to 903 in 5fb1f93
huggingface impl:
https://github.com/huggingface/transformers/blob/187439c3fa139b2102a874483e9f8f0cfa8e5557/src/transformers/models/qwen2_vl/processing_qwen2_vl.py#L133-L153
Add support for Qwen2-VL video embeddings input
Example code