-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
custom allreduce + torch.compile #10121
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
This pull request has merge conflicts that must be resolved before it can be |
vllm/distributed/parallel_state.py
Outdated
else: | ||
torch.distributed.all_reduce(input_, group=self.device_group) | ||
assert pynccl_comm is not None | ||
with pynccl_comm.change_state(enable=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can change pynccl
to be always enabled.
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
should work now
for profiling size for decode size |
@SageMoore thanks for your pioneering investigation! |
# TODO: pynccl should not use `stream=` | ||
# it can just always use the current stream. | ||
out = pynccl_comm.all_reduce(input_, | ||
stream=torch.cuda.current_stream()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a little confused about what this TODO meant, so I had to dig a bit.
Looks like PyNcclCommunicator creates a new stream in its __init__
method and uses it by default so we always have to pass in the current stream. Do you know it behaves this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly historical. we can remove it. but i don't want to do it in this pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me!
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Thanks for the help getting this over the line! |
Signed-off-by: youkaichao <[email protected]> Co-authored-by: youkaichao <[email protected]> Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: youkaichao <[email protected]> Co-authored-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]> Co-authored-by: youkaichao <[email protected]>
This Pr changes pynccl all reduce to be out of place and removes support for torch distributed's all reduce.