-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1] Multiprocessing Tensor Parallel Support for v1 #9856
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
@tlrmchlsmth Thanks for the great work! Let me know when it is ready for review. |
This pull request has merge conflicts that must be resolved before it can be |
@tlrmchlsmth this looks like a great start and is in the right direction imo. I have been thinking a lot though about how to streamline some things in V1... going to dump some of those ideas here and we can discuss more next week.
Mostly orthogonal to the above, I also feel we should rethink the executor abstraction / class hierarchy to better isolate the accelerator-specific aspects. There's a significant amount of duplicated logic right now across different executors (and even workers) and I think we could unify/consolidate a lot of that. |
This pull request has merge conflicts that must be resolved before it can be |
12b8813
to
34ca6bb
Compare
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
34ca6bb
to
49869fa
Compare
Do folks think I should go ahead with this in the current PR? I know @youkaichao had some concerns about ease of debugging if we always put the worker in a separate process. Not sure I see a benefit for this implementation, beyond consolidating the implementation and reducing code size. For asynchronous scheduling, it makes a whole lot of sense. |
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
@tlrmchlsmth I think it's fine to defer it. |
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
9f561ac
to
50a12bc
Compare
Switched initialization over to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the great efforts! the pr looks good to me in general, I left several new comments, but they should be easy to address.
I do want to discuss one more thing about the initialization process tomorrow.
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the great work! I don't have major concern now, there are some followup work, but we don't necessarily need to do it in this pr.
before merge, please fix the nit comments about vllm.envs
, and change the signature of collective_rpc
to collective_rpc(self, method, timeout, *args, **kwargs)
, thanks.
- update collective_rpc interface - change run_on_both_enginges pytorch fixture Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
LGTM to merge now. |
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Implementation of tensor parallel support for V1.
Some differences from V0:
Some benchmarks are below, using python 3.12. I'm running Llama3-8B on 2 A100s, which will really make any overheads shine.