fix: Correct tensor size calculation in TensorSerializer #65
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR replaces the original tensor size calculation for bulk writing which summed the sizes of each tensor's
UntypedStorage
size with an approach that calculates the tensor size based off of the number of elements and the size of each elements.The issue that the original tensor size calculation prevented was that it did not account for views or shared storages. For example, Megatron has a
main_grad
tensor in each parameter which is a view from a larger buffer where all of themain_grad
objects share the same storage object. With the previous method on a 6B model, the original size calculation returns ~6GiB for eachmain_grad
tensor despite the actual tensor being a fraction of the size. In the end, it attempted to write 1.3TiB of data for all of themain_grad
tensors.The new method ensures that the tensor size is calculated based from the actual number of elements (
tensor.nelement()
) multiplied against the size of each element (tensor.element_size()
) which skirts around the overallocation issue by calculating tensor size from a storage that could have been potentially shared.