-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1] Add all_token_ids attribute to Request #10135
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
@@ -324,7 +324,7 @@ def send_to_detokenizer(self, sampled: List[Tuple[Request, int]]) -> None: | |||
) | |||
for req, num_tokens in sampled: | |||
inputs.req_ids.append(req.request_id) | |||
if len(req.output_token_ids) == num_tokens: | |||
if req.num_output_tokens == num_tokens: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len
is supported by ConstantList, but I fixed this for clarity.
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
f297a7c
to
34a6635
Compare
Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Loc Huynh <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
This PR adds the
all_token_ids
to theRequest
class, which is always updated withoutput_token_ids
. Havingall_token_ids
can be useful because we can directly on the entire list of token ids (i.e.,prompt_token_ids + output_token_ids
) without the O(n) computation every time.To make sure that the two lists are updated atomically, the PR introduces ConstantList. The constant list provides an immutable view of the list with O(1) overheads.