-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] hook-based support for distributed but shared parameter #243
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good to me. Suggested some relatively minor changes.
The only other major comment I have is to also evaluate DDP reduction hooks vs these tensor hooks and whether or not that approach is preferable due to the explicit non-blocking communication routines.
/blossom-ci |
/blossom-ci |
/blossom-ci |
/blossom-ci |
/blossom-ci |
2 similar comments
/blossom-ci |
/blossom-ci |
/blossom-ci |
Modulus Pull Request
Description
#235 asks for support for distributed parameters which are shared across the model-parallel group. Initially, there was the idea of using a wrapper idea. The hook-based approach, however, should be less intrusive and more flexible and is introduced in this draft. Shared weights are simply marked or unmarked as needed which registers a gradient hook taking care of the necessary reduction of gradients.
closes #235 , but with a different implementation idea
Checklist
Dependencies