-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace volatile
by proper memory ordering in HIP
#1472
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have the documentation about rocm-clang support gcc atomic on GPU?
Is there only one compiler rocm-clang from hipcc on AMD GPU?
This is based on communications with an AMD engineer, and the fact that |
cb10ce0
to
cb5fee9
Compare
cb5fee9
to
c0027e6
Compare
Looks like the intrinsics are not supported by older HIP versions yet. I'll fall back on the GCC versions then |
b989041
to
0fef39f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the code self, LGTM.
I am still not sure whether gcc intrinsics work on GPU address. compiling to libstdc++ only mean for CPU side not GPU, right?
} | ||
|
||
|
||
template <typename ValueType> | ||
__device__ __forceinline__ ValueType load_relaxed_shared(const ValueType* ptr) | ||
__device__ __forceinline__ thrust::complex<ValueType> load_relaxed( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe only for thrust::complex<double>
?
thrust::complex<float>
can be done with int64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow, we only use thrust::complex with float value types Formatting issues with templates. We don't need full atomicity with thrust::complex
, only element-wise, so this allows a more efficient code generation. Maybe the compiler even combines them? Doesn't matter much
@@ -6,6 +6,7 @@ | |||
#define GKO_HIP_COMPONENTS_MEMORY_HIP_HPP_ | |||
|
|||
|
|||
#include <cstring> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memcpy
is defined in string.h, not sure if this is actually necessary though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs quite a bit of documentation, since this relies on functionality that is not in the official documentation. It needs at least links to where you found the__hip_atomic_load|store
, and the different scopes. If the references themselves don't have much documentation, then also properly documenting them would be necessary.
@yhmtsai The tests run fine, and this was a suggestion by an AMD engineer, so I'm confident we can use them. If the intrinsics weren't supported, it would fail to compile instead. |
0fef39f
to
39edfb8
Compare
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information The version of Java (11.0.3) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17. |
rocm-clang
supports the GCC__atomic
intrinsics, which we can use to implement the atomic operations instead of usingvolatile
and memory fences.This is a follow-up to #1344, so I requested reviews from the same reviewers