[Bugfix] Embedding model pooling_type equals ALL and multi input's bug #10494

BBuf · 2024-11-20T12:29:02Z

When I run the following example code, it triggers an error:

from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

responses = client.embeddings.create(
    input=[
        "Hello my name is",
        "The best thing about vLLM is that it supports many different models"
    ],
    model=model,
)

for data in responses.data:
    print(data.embedding)  # list of float of len 4096

NFO 11-20 19:56:18 engine.py:267] Added request embd-1ffc4573aa2b4189b2e2cd3a413127cd-0.
INFO 11-20 19:56:18 engine.py:267] Added request embd-1ffc4573aa2b4189b2e2cd3a413127cd-1.
CRITICAL 11-20 19:56:18 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO:     [127.0.0.1:34172](http://127.0.0.1:34172/) - "POST /v1/embeddings HTTP/1.1" 500 Internal Server Error
ERROR 11-20 19:56:18 engine.py:135] RuntimeError('stack expects each tensor to be equal size, but got [3, 1] at entry 0 and [1, 1] at entry 1')
ERROR 11-20 19:56:18 engine.py:135] Traceback (most recent call last):
ERROR 11-20 19:56:18 engine.py:135]   File "/mnt/data/vllm/vllm/engine/multiprocessing/engine.py", line 133, in start
ERROR 11-20 19:56:18 engine.py:135]     self.run_engine_loop()
ERROR 11-20 19:56:18 engine.py:135]   File "/mnt/data/vllm/vllm/engine/multiprocessing/engine.py", line 196, in run_engine_loop
ERROR 11-20 19:56:18 engine.py:135]     request_outputs = self.engine_step()
ERROR 11-20 19:56:18 engine.py:135]   File "/mnt/data/vllm/vllm/engine/multiprocessing/engine.py", line 214, in engine_step
ERROR 11-20 19:56:18 engine.py:135]     raise e
ERROR 11-20 19:56:18 engine.py:135]   File "/mnt/data/vllm/vllm/engine/multiprocessing/engine.py", line 205, in engine_step
ERROR 11-20 19:56:18 engine.py:135]     return self.engine.step()
ERROR 11-20 19:56:18 engine.py:135]   File "/mnt/data/vllm/vllm/engine/llm_engine.py", line 1454, in step
ERROR 11-20 19:56:18 engine.py:135]     outputs = self.model_executor.execute_model(
ERROR 11-20 19:56:18 engine.py:135]   File "/mnt/data/vllm/vllm/executor/gpu_executor.py", line 125, in execute_model
ERROR 11-20 19:56:18 engine.py:135]     output = self.driver_worker.execute_model(execute_model_req)
ERROR 11-20 19:56:18 engine.py:135]   File "/mnt/data/vllm/vllm/worker/worker_base.py", line 343, in execute_model
ERROR 11-20 19:56:18 engine.py:135]     output = self.model_runner.execute_model(
ERROR 11-20 19:56:18 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 11-20 19:56:18 engine.py:135]     return func(*args, **kwargs)
ERROR 11-20 19:56:18 engine.py:135]   File "/mnt/data/vllm/vllm/worker/embedding_model_runner.py", line 138, in execute_model
ERROR 11-20 19:56:18 engine.py:135]     self.model.pooler(hidden_states=hidden_or_intermediate_states,
ERROR 11-20 19:56:18 engine.py:135]   File "/mnt/data/vllm/vllm/model_executor/models/qwen2_rm.py", line 155, in pooler
ERROR 11-20 19:56:18 engine.py:135]     return self._pooler(hidden_states, pooling_metadata)
ERROR 11-20 19:56:18 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 11-20 19:56:18 engine.py:135]     return self._call_impl(*args, **kwargs)
ERROR 11-20 19:56:18 engine.py:135]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 11-20 19:56:18 engine.py:135]     return forward_call(*args, **kwargs)
ERROR 11-20 19:56:18 engine.py:135]   File "/mnt/data/vllm/vllm/model_executor/layers/pooler.py", line 104, in forward
ERROR 11-20 19:56:18 engine.py:135]     pooled_data = torch.stack(pooled_data_lst)
ERROR 11-20 19:56:18 engine.py:135] RuntimeError: stack expects each tensor to be equal size, but got [3, 1] at entry 0 and [1, 1] at entry 1
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [52561]

I found that the bug is at this commit. Reverting this PR's pooling_type='ALL' change to the original logic allows the embedding model correctly infer multiple prompts of different lengths.

github-actions · 2024-11-20T12:29:15Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337 · 2024-11-20T14:02:38Z

I changed it to avoid an error when calling normalize and softmax later on:

        if self.normalize:
            pooled_data = nn.functional.normalize(pooled_data, p=2, dim=1)
        if self.softmax:
            pooled_data = nn.functional.softmax(pooled_data, dim=-1)

Didn't realize that the shapes per element can be different though, thanks for fixing! I guess we need to handle pooled_data being either a tensor or a list when applying normalize and softmax.

BBuf · 2024-11-21T01:33:00Z

I changed it to avoid an error when calling normalize and softmax later on:
        if self.normalize:
            pooled_data = nn.functional.normalize(pooled_data, p=2, dim=1)
        if self.softmax:
            pooled_data = nn.functional.softmax(pooled_data, dim=-1)
Didn't realize that the shapes per element can be different though, thanks for fixing! I guess we need to handle pooled_data being either a tensor or a list when applying normalize and softmax.

Ok, I will handle the case of a list of tensors later.

BBuf · 2024-11-21T01:43:48Z

I changed it to avoid an error when calling normalize and softmax later on:
        if self.normalize:
            pooled_data = nn.functional.normalize(pooled_data, p=2, dim=1)
        if self.softmax:
            pooled_data = nn.functional.softmax(pooled_data, dim=-1)
Didn't realize that the shapes per element can be different though, thanks for fixing! I guess we need to handle pooled_data being either a tensor or a list when applying normalize and softmax.

I have reverted the pooling_type=step case to return a list of tensors too. Additionally, I have added checks to separately process a list of tensors during normalization and softmax.

DarkLight1337 · 2024-11-21T02:27:30Z

Looks good, but please fix the lint errors.

DarkLight1337 · 2024-11-21T02:39:32Z

Thanks for fixing!

BBuf · 2024-11-21T12:29:02Z

Thanks for fixing!

But ci has failed,

It's not related to this pr?

DarkLight1337 · 2024-11-21T12:49:59Z

Let me retry the test

vllm-project#10494) Signed-off-by: Tyler Michael Smith <[email protected]>

vllm-project#10494) Signed-off-by: Maxime Fournioux <[email protected]>

vllm-project#10494)

fix embeeding model pooling_type equals all and multi input's bug

7e1efe3

Merge branch 'vllm-project:main' into main

694ea50

fix pooler.py normalize and softmax bug when multi input

34b87de

fix format

0ab7b58

DarkLight1337 approved these changes Nov 21, 2024

View reviewed changes

DarkLight1337 changed the title ~~fix embeeding model pooling_type equals ALL and multi input's bug~~ [Bugfix] Embedding model pooling_type equals ALL and multi input's bug Nov 21, 2024

DarkLight1337 enabled auto-merge (squash) November 21, 2024 08:49

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 21, 2024

DarkLight1337 merged commit 4d676f0 into vllm-project:main Nov 21, 2024
63 checks passed

tlrmchlsmth pushed a commit to neuralmagic/vllm that referenced this pull request Nov 23, 2024

[Bugfix] Embedding model pooling_type equals ALL and multi input's bug (

ca3772c

vllm-project#10494) Signed-off-by: Tyler Michael Smith <[email protected]>

DarkLight1337 mentioned this pull request Nov 26, 2024

[Usage]: Llama-2-7b-chat-hf as embedding model #10673

Closed

1 task

mfournioux pushed a commit to mfournioux/vllm that referenced this pull request Nov 28, 2024

[Bugfix] Embedding model pooling_type equals ALL and multi input's bug (

30cccd6

vllm-project#10494) Signed-off-by: Maxime Fournioux <[email protected]>

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[Bugfix] Embedding model pooling_type equals ALL and multi input's bug (

4bbc7bd

vllm-project#10494)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Embedding model pooling_type equals ALL and multi input's bug #10494

[Bugfix] Embedding model pooling_type equals ALL and multi input's bug #10494

BBuf commented Nov 20, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 20, 2024

DarkLight1337 commented Nov 20, 2024

BBuf commented Nov 21, 2024

BBuf commented Nov 21, 2024 •

edited

Loading

DarkLight1337 commented Nov 21, 2024

DarkLight1337 commented Nov 21, 2024

BBuf commented Nov 21, 2024

DarkLight1337 commented Nov 21, 2024

[Bugfix] Embedding model pooling_type equals ALL and multi input's bug #10494

[Bugfix] Embedding model pooling_type equals ALL and multi input's bug #10494

Conversation

BBuf commented Nov 20, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 20, 2024

DarkLight1337 commented Nov 20, 2024

BBuf commented Nov 21, 2024

BBuf commented Nov 21, 2024 • edited Loading

DarkLight1337 commented Nov 21, 2024

DarkLight1337 commented Nov 21, 2024

BBuf commented Nov 21, 2024

DarkLight1337 commented Nov 21, 2024

BBuf commented Nov 20, 2024 •

edited by github-actions bot

Loading

BBuf commented Nov 21, 2024 •

edited

Loading