-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
addressing the case when output region for repeat operation is too big #386
base: branch-24.03
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2046,7 +2046,6 @@ def repeat(a, repeats, axis=None): | |
-------- | ||
Multiple GPUs, Multiple CPUs | ||
""" | ||
|
||
# when array is a scalar | ||
if np.ndim(a) == 0: | ||
if np.ndim(repeats) == 0: | ||
|
@@ -2075,7 +2074,7 @@ def repeat(a, repeats, axis=None): | |
axis = np.int32(axis) | ||
|
||
if axis >= array.ndim: | ||
return ValueError("axis exceeds dimension of the input array") | ||
raise ValueError("axis exceeds dimension of the input array") | ||
|
||
# If repeats is on a zero sized axis, then return the array. | ||
if array.shape[axis] == 0: | ||
|
@@ -2100,11 +2099,36 @@ def repeat(a, repeats, axis=None): | |
category=UserWarning, | ||
) | ||
repeats = np.int64(repeats) | ||
result = array._thunk.repeat( | ||
repeats=repeats, | ||
axis=axis, | ||
scalar_repeats=True, | ||
) | ||
if repeats < 0: | ||
raise ValueError( | ||
"'repeats' should not be negative: {}".format(repeats) | ||
) | ||
|
||
# check output shape (if it will fit to GPU or not) | ||
out_shape = list(array.shape) | ||
out_shape[axis] *= repeats | ||
out_shape = tuple(out_shape) | ||
size = sum(out_shape) * array.itemsize | ||
# check if size of the output array is less 8GB. In this case we can | ||
# use output regions, otherwise we will use statcally allocated | ||
# array | ||
if size < 8589934592 / 2: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A bunch of comments about this, going from lower- to higher-level: This seems to be testing for 4GB, not 8GB? This should be a named constant, and ideally written as This limit is not considering the available memory. 8GB may be too large or too little depending on the memory. This number should probably be a percentage of the relevant available memory. This is considering the full size of the array, not the size of each chunk. E.g. 16GB may be totally fine if split across 8 GPUs. It seems to me that the only real decision we're making here is whether to perform the operation using an eager output or a deferred output. Therefore, we want to also be querying the (relative) sizes of the eager and deferred pools. Ideally we would also consider the current/projected load on each pool, which is not possible right now, but might be possible in the future, if legate.core takes over more instance management responsibilities. Finally, AFAIK the unification of eager and deferred pools is on the Legion roadmap. If that happens, then we could safely always use the more efficient eager implementation. @lightsighter how far in the future do you think this is? If nobody has complained about this issue, we may want to wait until unification lands. |
||
|
||
result = array._thunk.repeat( | ||
repeats=repeats, axis=axis, scalar_repeats=True | ||
) | ||
else: | ||
# this implementation is taken from CuPy | ||
result = ndarray(shape=out_shape, dtype=array.dtype) | ||
a_index = [slice(None)] * len(out_shape) | ||
res_index = list(a_index) | ||
offset = 0 | ||
for i in range(a._shape[axis]): | ||
a_index[axis] = slice(i, i + 1) | ||
res_index[axis] = slice(offset, offset + repeats) | ||
result[res_index] = array[a_index] | ||
offset += repeats | ||
Comment on lines
+2126
to
+2130
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't love this. We are emitting a separate operation for each slice. It would be more efficient if we could manually partition the
However, this would require support for manual-coloring partitioning from the core. @magnatelee is this something that's reasonable? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think anything that does coloring would be any more scalable than the original code. and it is in some sense worse as it takes away the core's ability to reason about the partitioning. for example, if the core saw multiple tasks operating on disjoint parts of the same array, it could potentially partition and map them in a way that those tasks are distributed in a balanced manner. |
||
return result | ||
# repeats is an array | ||
else: | ||
# repeats should be integer type | ||
|
@@ -2115,10 +2139,32 @@ def repeat(a, repeats, axis=None): | |
) | ||
repeats = repeats.astype(np.int64) | ||
if repeats.shape[0] != array.shape[axis]: | ||
return ValueError("incorrect shape of repeats array") | ||
result = array._thunk.repeat( | ||
repeats=repeats._thunk, axis=axis, scalar_repeats=False | ||
) | ||
raise ValueError("incorrect shape of repeats array") | ||
|
||
# check output shape (if it will fit to GPU or not) | ||
out_shape = list(array.shape) | ||
n_repeats = sum(repeats) | ||
out_shape[axis] = n_repeats | ||
out_shape = tuple(out_shape) | ||
size = sum(out_shape) * array.itemsize | ||
# check if size of the output array is less 8GB. In this case we can | ||
# use output regions, otherwise we will use statcally allocated | ||
# array | ||
if size < 8589934592 / 2: | ||
result = array._thunk.repeat( | ||
repeats=repeats._thunk, axis=axis, scalar_repeats=False | ||
) | ||
else: # this implementation is taken from CuPy | ||
result = ndarray(shape=out_shape, dtype=array.dtype) | ||
a_index = [slice(None)] * len(out_shape) | ||
res_index = list(a_index) | ||
offset = 0 | ||
for i in range(a._shape[axis]): | ||
a_index[axis] = slice(i, i + 1) | ||
res_index[axis] = slice(offset, offset + repeats[i]) | ||
result[res_index] = array[a_index] | ||
offset += repeats[i] | ||
return result | ||
return ndarray(shape=result.shape, thunk=result) | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use f-strings in new code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and use
raise
to raise an exception instead ofreturn