-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for https://github.com/instructlab/training/issues/254 #273
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: ashna000 <[email protected]>
@RobotSail Can you please review this ? |
Signed-off-by: ashna000 <[email protected]>
@@ -76,7 +76,7 @@ def get_effective_samples_per_minibatch(num_tokens_per_gpu): | |||
padding=True, | |||
) | |||
batches = sampler.generate_batches() | |||
return len(dataset) / len(batches) | |||
return len(dataset) / len(batches) if len(batches) > 0 else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're hitting a scenario where Multipack generated no batches, then we need to either error out and have the calling function handle it (by falling back to DistributedDataSampler
, or simply by preventing us from using multipack in the first place.
My recommendation here is to do this:
- Raise an appropriate exception here (either
RuntimeError
orDivisionByZeroError
) - Have the calling function check for this additional exception and fall back to the naive sampler
Thanks for this PR @ashna000, this will be a really important fix ! I've left a comment on how we should handle this problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashna000 can you please squash your commits?
@ashna000 these still need to be squashed |
Added checks for resolving the issue reported in #254