[1/N] Initial prototype for multi-modal processor #10044

DarkLight1337 · 2024-11-05T17:02:18Z

This PR adds the core code for multi-modal processor while maintaining backward compatibility. The main purpose of this PR is to "reserve" code changes (mainly related to the dependencies of multi-modal processor) to reduce the risk of merge conflicts caused by subsequent PRs.

Note that currently there are no models that use the new multi-modal processor - I will implement a few of them in the next PR. As such, the details of multi-modal processor are still subject to change.

Part of #10114

github-actions · 2024-11-05T17:02:32Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: DarkLight1337 <[email protected]>

mergify · 2024-11-06T07:14:33Z

This pull request has merge conflicts that must be resolved before it can be
merged. @DarkLight1337 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2024-11-09T04:33:44Z

vllm/multimodal/processing.py

+def candidate_placeholders(
+    tokenizer: AnyTokenizer,
+    placeholder_text: str,
+) -> Collection[List[int]]:
+    """Generate token ID sequences that may represent a placeholder text."""
+    # When the placeholder text is not mapped to a special token ID,
+    # it may be tokenized differently based on whether it is at the start/end
+    # of the string. So, we go through each combination of whether the text
+    # is at the start and end boundaries of the string
+
+    # Matches the placeholder when it is in the middle of the string
+    start_id, = encode_no_special_tokens(tokenizer, "a")
+    end_id, = encode_no_special_tokens(tokenizer, "b")
+
+    candidate_basic = encode_no_special_tokens(tokenizer, placeholder_text)
+
+    start_id_, *candidate_a = encode_no_special_tokens(
+        tokenizer,
+        f"a{placeholder_text}",
+    )
+    assert start_id == start_id_
+
+    start_id_, *candidate_ab, end_id_ = encode_no_special_tokens(
+        tokenizer,
+        f"a{placeholder_text}b",
+    )
+    assert start_id == start_id_ and end_id == end_id_
+
+    *candidate_b, end_id_ = encode_no_special_tokens(
+        tokenizer,
+        f"{placeholder_text}b",
+    )
+    assert end_id == end_id_
+
+    # Remove duplicates (need to convert to tuple to be hashable)
+    unique_candidates = {
+        tuple(c)
+        for c in [candidate_basic, candidate_a, candidate_ab, candidate_b]
+    }
+
+    # Convert back to list
+    return [list(c) for c in unique_candidates]


A generalization of our existing code for Phi-3V

DarkLight1337 · 2024-11-09T04:33:57Z

vllm/multimodal/processing.py

+def apply_placeholders(
+    token_ids: List[int],
+    placeholder_ids: List[int],
+    get_replacement_ids: Callable[[], List[int]],
+) -> Optional[PlaceholderRange]:
+    """
+    Find the first occurrence of :code:`placeholder_ids`,
+    and replace it with the output of :code:`get_replacement_ids`.
+
+    This function updates :code:`token_ids` in place.
+    """
+    placeholder_length = len(placeholder_ids)
+
+    for start_idx in range(len(token_ids) - placeholder_length + 1):
+        if token_ids[start_idx:placeholder_length] == placeholder_ids:
+            token_ids[start_idx:placeholder_length] = get_replacement_ids()
+
+            return PlaceholderRange(offset=start_idx,
+                                    length=placeholder_length)
+
+    return None


A generalization of our existing code for HF Pixtral

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2024-11-09T04:53:46Z

vllm/multimodal/image.py

-        # Processed by input processor
-        if isinstance(data, BatchFeature):
-            return MultiModalKwargs(data.data)
-


This was added by the Llama-3.2 PR, but I found that this model doesn't use HF processor in vLLM input processor anymore, so it should be safe to remove.

Isotr0py

LGTM!

Isotr0py · 2024-11-13T04:49:17Z

Seems that v1 entrypoint test is failing: https://buildkite.com/vllm/fastcheck/builds/7789#01931e26-10d6-4960-b792-e949972b2aef/188-1114

mergify · 2024-11-13T04:51:58Z

rebase

❌ Base branch update has failed

Git reported the following error:

Rebasing (1/7)
Auto-merging tests/models/decoder_only/vision_language/mm_processor_kwargs/test_qwen.py
CONFLICT (content): Merge conflict in tests/models/decoder_only/vision_language/mm_processor_kwargs/test_qwen.py
Auto-merging vllm/config.py
Auto-merging vllm/engine/async_llm_engine.py
Auto-merging vllm/engine/llm_engine.py
Auto-merging vllm/entrypoints/openai/serving_chat.py
Auto-merging vllm/entrypoints/openai/serving_completion.py
Auto-merging vllm/inputs/preprocess.py
Auto-merging vllm/model_executor/models/chatglm.py
Auto-merging vllm/model_executor/models/fuyu.py
Auto-merging vllm/model_executor/models/internvl.py
Auto-merging vllm/model_executor/models/minicpmv.py
Auto-merging vllm/model_executor/models/mllama.py
Auto-merging vllm/model_executor/models/molmo.py
Auto-merging vllm/model_executor/models/pixtral.py
Auto-merging vllm/model_executor/models/qwen.py
Auto-merging vllm/model_executor/models/qwen2_audio.py
Auto-merging vllm/model_executor/models/qwen2_vl.py
Auto-merging vllm/model_executor/models/ultravox.py
Auto-merging vllm/multimodal/__init__.py
CONFLICT (content): Merge conflict in vllm/multimodal/__init__.py
Auto-merging vllm/multimodal/base.py
CONFLICT (content): Merge conflict in vllm/multimodal/base.py
Auto-merging vllm/multimodal/registry.py
CONFLICT (content): Merge conflict in vllm/multimodal/registry.py
Auto-merging vllm/multimodal/video.py
Auto-merging vllm/sequence.py
Auto-merging vllm/spec_decode/draft_model_runner.py
Auto-merging vllm/v1/engine/llm_engine.py
CONFLICT (content): Merge conflict in vllm/v1/engine/llm_engine.py
Auto-merging vllm/worker/cpu_enc_dec_model_runner.py
Auto-merging vllm/worker/cpu_model_runner.py
CONFLICT (content): Merge conflict in vllm/worker/cpu_model_runner.py
Auto-merging vllm/worker/enc_dec_model_runner.py
CONFLICT (content): Merge conflict in vllm/worker/enc_dec_model_runner.py
Auto-merging vllm/worker/model_runner.py
CONFLICT (content): Merge conflict in vllm/worker/model_runner.py
Auto-merging vllm/worker/neuron_model_runner.py
CONFLICT (content): Merge conflict in vllm/worker/neuron_model_runner.py
Auto-merging vllm/worker/openvino_model_runner.py
CONFLICT (content): Merge conflict in vllm/worker/openvino_model_runner.py
Auto-merging vllm/worker/xpu_model_runner.py
CONFLICT (content): Merge conflict in vllm/worker/xpu_model_runner.py
error: could not apply 5108119b... Initial prototype for multi-modal processor
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 5108119b... Initial prototype for multi-modal processor

Signed-off-by: DarkLight1337 <[email protected]>

mergify · 2024-11-13T08:29:30Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @DarkLight1337.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested review from ywang96 and Isotr0py November 5, 2024 17:02

mergify bot added documentation Improvements or additions to documentation frontend labels Nov 5, 2024

DarkLight1337 force-pushed the mm-processor branch 3 times, most recently from ce2c9ef to 1c699eb Compare November 5, 2024 17:20

Initial prototype for multi-modal processor

5108119

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 force-pushed the mm-processor branch from 1c699eb to 5108119 Compare November 5, 2024 17:31

mergify bot added the needs-rebase label Nov 6, 2024

DarkLight1337 mentioned this pull request Nov 7, 2024

[RFC]: Merge input processor and input mapper for multi-modal models #10114

Open

33 tasks

Merge branch 'main' into mm-processor

a1f9b65

Signed-off-by: DarkLight1337 <[email protected]>

mergify bot removed the needs-rebase label Nov 9, 2024

DarkLight1337 added 3 commits November 9, 2024 04:00

Fix typo

d403af6

Signed-off-by: DarkLight1337 <[email protected]>

Add back handling of other data types

9c23c3f

Signed-off-by: DarkLight1337 <[email protected]>

Proper detection of whether processor is used

6955998

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 marked this pull request as ready for review November 9, 2024 04:31

DarkLight1337 requested review from WoosukKwon, zhuohan123, youkaichao, alexm-redhat, comaniac and njhill as code owners November 9, 2024 04:31

DarkLight1337 commented Nov 9, 2024

View reviewed changes

DarkLight1337 added 2 commits November 9, 2024 04:36

Make this a cached property as well

10818dd

Signed-off-by: DarkLight1337 <[email protected]>

Cleanup

264c5c6

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 commented Nov 9, 2024

View reviewed changes

Isotr0py approved these changes Nov 13, 2024

View reviewed changes

Merge branch 'main' into mm-processor

c76ed43

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 13, 2024

DarkLight1337 added 6 commits November 13, 2024 06:17

Fix imports

00e29b4

Signed-off-by: DarkLight1337 <[email protected]>

Fix more tests

e264c94

Signed-off-by: DarkLight1337 <[email protected]>

Fix types

769614d

Signed-off-by: DarkLight1337 <[email protected]>

Fix types 2

fdc5b6b

Signed-off-by: DarkLight1337 <[email protected]>

Factor out common code for prompt extraction

73c0c03

Signed-off-by: DarkLight1337 <[email protected]>

Fix import

fb9c54b

Signed-off-by: DarkLight1337 <[email protected]>

mergify bot added the needs-rebase label Nov 13, 2024

Merge branch 'main' into mm-processor

3be524e

Signed-off-by: DarkLight1337 <[email protected]>

mergify bot removed the needs-rebase label Nov 13, 2024

DarkLight1337 added 4 commits November 13, 2024 09:37

Fix tests

10e881f

Signed-off-by: DarkLight1337 <[email protected]>

Merge branch 'main' into mm-processor

f2baf32

Fix request

fd66e86

Signed-off-by: DarkLight1337 <[email protected]>

Fix request

25d1980

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 enabled auto-merge (squash) November 13, 2024 12:19

DarkLight1337 merged commit 0b8bb86 into main Nov 13, 2024
53 checks passed

DarkLight1337 deleted the mm-processor branch November 13, 2024 12:39

rickyyx pushed a commit to rickyyx/vllm that referenced this pull request Nov 13, 2024

[1/N] Initial prototype for multi-modal processor (vllm-project#10044)

09eef14

Signed-off-by: DarkLight1337 <[email protected]>

sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024

[1/N] Initial prototype for multi-modal processor (vllm-project#10044)

d11991a

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[1/N] Initial prototype for multi-modal processor (vllm-project#10044)

0199209

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 mentioned this pull request Nov 20, 2024

[2/N] Proper handling of placeholders in merged multi-modal processor #10485

Merged

mfournioux pushed a commit to mfournioux/vllm that referenced this pull request Nov 20, 2024

[1/N] Initial prototype for multi-modal processor (vllm-project#10044)

c940a14

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>

ywang96 mentioned this pull request Nov 20, 2024

[V1] VLM prefix caching: Add hashing of images #10497

Draft

tlrmchlsmth pushed a commit to neuralmagic/vllm that referenced this pull request Nov 23, 2024

[1/N] Initial prototype for multi-modal processor (vllm-project#10044)

140f5b4

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[1/N] Initial prototype for multi-modal processor (vllm-project#10044)

1709430

Signed-off-by: DarkLight1337 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/N] Initial prototype for multi-modal processor #10044

[1/N] Initial prototype for multi-modal processor #10044

DarkLight1337 commented Nov 5, 2024 •

edited

Loading

github-actions bot commented Nov 5, 2024

mergify bot commented Nov 6, 2024

DarkLight1337 Nov 9, 2024 •

edited

Loading

DarkLight1337 Nov 9, 2024 •

edited

Loading

DarkLight1337 Nov 9, 2024

Isotr0py left a comment

Isotr0py commented Nov 13, 2024

mergify bot commented Nov 13, 2024

mergify bot commented Nov 13, 2024

[1/N] Initial prototype for multi-modal processor #10044

[1/N] Initial prototype for multi-modal processor #10044

Conversation

DarkLight1337 commented Nov 5, 2024 • edited Loading

github-actions bot commented Nov 5, 2024

mergify bot commented Nov 6, 2024

DarkLight1337 Nov 9, 2024 • edited Loading

Choose a reason for hiding this comment

DarkLight1337 Nov 9, 2024 • edited Loading

Choose a reason for hiding this comment

DarkLight1337 Nov 9, 2024

Choose a reason for hiding this comment

Isotr0py left a comment

Choose a reason for hiding this comment

Isotr0py commented Nov 13, 2024

mergify bot commented Nov 13, 2024

❌ Base branch update has failed

mergify bot commented Nov 13, 2024

DarkLight1337 commented Nov 5, 2024 •

edited

Loading

DarkLight1337 Nov 9, 2024 •

edited

Loading

DarkLight1337 Nov 9, 2024 •

edited

Loading