-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1/N] Initial prototype for multi-modal processor #10044
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
ce2c9ef
to
1c699eb
Compare
Signed-off-by: DarkLight1337 <[email protected]>
1c699eb
to
5108119
Compare
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
def candidate_placeholders( | ||
tokenizer: AnyTokenizer, | ||
placeholder_text: str, | ||
) -> Collection[List[int]]: | ||
"""Generate token ID sequences that may represent a placeholder text.""" | ||
# When the placeholder text is not mapped to a special token ID, | ||
# it may be tokenized differently based on whether it is at the start/end | ||
# of the string. So, we go through each combination of whether the text | ||
# is at the start and end boundaries of the string | ||
|
||
# Matches the placeholder when it is in the middle of the string | ||
start_id, = encode_no_special_tokens(tokenizer, "a") | ||
end_id, = encode_no_special_tokens(tokenizer, "b") | ||
|
||
candidate_basic = encode_no_special_tokens(tokenizer, placeholder_text) | ||
|
||
start_id_, *candidate_a = encode_no_special_tokens( | ||
tokenizer, | ||
f"a{placeholder_text}", | ||
) | ||
assert start_id == start_id_ | ||
|
||
start_id_, *candidate_ab, end_id_ = encode_no_special_tokens( | ||
tokenizer, | ||
f"a{placeholder_text}b", | ||
) | ||
assert start_id == start_id_ and end_id == end_id_ | ||
|
||
*candidate_b, end_id_ = encode_no_special_tokens( | ||
tokenizer, | ||
f"{placeholder_text}b", | ||
) | ||
assert end_id == end_id_ | ||
|
||
# Remove duplicates (need to convert to tuple to be hashable) | ||
unique_candidates = { | ||
tuple(c) | ||
for c in [candidate_basic, candidate_a, candidate_ab, candidate_b] | ||
} | ||
|
||
# Convert back to list | ||
return [list(c) for c in unique_candidates] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A generalization of our existing code for Phi-3V
def apply_placeholders( | ||
token_ids: List[int], | ||
placeholder_ids: List[int], | ||
get_replacement_ids: Callable[[], List[int]], | ||
) -> Optional[PlaceholderRange]: | ||
""" | ||
Find the first occurrence of :code:`placeholder_ids`, | ||
and replace it with the output of :code:`get_replacement_ids`. | ||
|
||
This function updates :code:`token_ids` in place. | ||
""" | ||
placeholder_length = len(placeholder_ids) | ||
|
||
for start_idx in range(len(token_ids) - placeholder_length + 1): | ||
if token_ids[start_idx:placeholder_length] == placeholder_ids: | ||
token_ids[start_idx:placeholder_length] = get_replacement_ids() | ||
|
||
return PlaceholderRange(offset=start_idx, | ||
length=placeholder_length) | ||
|
||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A generalization of our existing code for HF Pixtral
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
# Processed by input processor | ||
if isinstance(data, BatchFeature): | ||
return MultiModalKwargs(data.data) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added by the Llama-3.2 PR, but I found that this model doesn't use HF processor in vLLM input processor anymore, so it should be safe to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Seems that v1 entrypoint test is failing: https://buildkite.com/vllm/fastcheck/builds/7789#01931e26-10d6-4960-b792-e949972b2aef/188-1114 |
❌ Base branch update has failedGit reported the following error:
|
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
This PR adds the core code for multi-modal processor while maintaining backward compatibility. The main purpose of this PR is to "reserve" code changes (mainly related to the dependencies of multi-modal processor) to reduce the risk of merge conflicts caused by subsequent PRs.
Note that currently there are no models that use the new multi-modal processor - I will implement a few of them in the next PR. As such, the details of multi-modal processor are still subject to change.
Part of #10114