Replace `volatile` by proper memory ordering in HIP #1472

upsj · 2023-11-25T15:20:18Z

rocm-clang supports the GCC __atomic intrinsics, which we can use to implement the atomic operations instead of using volatile and memory fences.

This is a follow-up to #1344, so I requested reviews from the same reviewers

yhmtsai

do you have the documentation about rocm-clang support gcc atomic on GPU?
Is there only one compiler rocm-clang from hipcc on AMD GPU?

dev_tools/scripts/generate_cuda_memory_ptx.py

upsj · 2023-11-27T16:37:29Z

This is based on communications with an AMD engineer, and the fact that clang has to be able to compile libstdc++, which relies on these intrinsics. We are not using any other compiler, and I'm not aware of any that we should be looking at. But thanks for the hint, I looked at what rocSPARSE is doing, and they are using __hip_atomic_load, which gives us the same kind of control over scope and ordering, but isn't documented, so I'm not sure we can rely on it too much

upsj · 2023-11-27T16:58:22Z

Relevant: https://github.com/ROCm-Developer-Tools/llvm-project/blob/6b9c186b2d4c8bd315034a9655a28d32bcf745ab/clang/test/SemaCUDA/atomic-ops.cu#L6

upsj · 2023-11-28T18:25:49Z

Looks like the intrinsics are not supported by older HIP versions yet. I'll fall back on the GCC versions then

yhmtsai

For the code self, LGTM.
I am still not sure whether gcc intrinsics work on GPU address. compiling to libstdc++ only mean for CPU side not GPU, right?

hip/components/memory.hip.hpp

yhmtsai · 2023-11-29T14:20:23Z

hip/components/memory.hip.hpp

 }


 template <typename ValueType>
-__device__ __forceinline__ ValueType load_relaxed_shared(const ValueType* ptr)
+__device__ __forceinline__ thrust::complex<ValueType> load_relaxed(


maybe only for thrust::complex<double>?
thrust::complex<float> can be done with int64

~~I don't follow, we only use thrust::complex with float value types~~ Formatting issues with templates. We don't need full atomicity with thrust::complex, only element-wise, so this allows a more efficient code generation. Maybe the compiler even combines them? Doesn't matter much

yhmtsai · 2023-11-29T14:21:06Z

hip/components/memory.hip.hpp

@@ -6,6 +6,7 @@
 #define GKO_HIP_COMPONENTS_MEMORY_HIP_HPP_


+#include <cstring>


is it used?

memcpy is defined in string.h, not sure if this is actually necessary though

MarcelKoch

I think this needs quite a bit of documentation, since this relies on functionality that is not in the official documentation. It needs at least links to where you found the__hip_atomic_load|store, and the different scopes. If the references themselves don't have much documentation, then also properly documenting them would be necessary.

hip/components/memory.hip.hpp

cuda/components/memory.cuh

upsj · 2023-11-29T21:13:46Z

@yhmtsai The tests run fine, and this was a suggestion by an AMD engineer, so I'm confident we can use them. If the intrinsics weren't supported, it would fail to compile instead.

sonarqubecloud · 2023-11-30T02:06:49Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

The version of Java (11.0.3) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

upsj added the 1:ST:ready-for-review This PR is ready for review label Nov 25, 2023

upsj requested review from thoasm and yhmtsai November 25, 2023 15:20

upsj self-assigned this Nov 25, 2023

ginkgo-bot added mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. reg:helper-scripts This issue/PR is related to the helper scripts mainly concerned with development of Ginkgo. labels Nov 25, 2023

upsj requested a review from MarcelKoch November 25, 2023 15:21

yhmtsai reviewed Nov 27, 2023

View reviewed changes

dev_tools/scripts/generate_cuda_memory_ptx.py Outdated Show resolved Hide resolved

upsj force-pushed the hip_memory_order branch from cb10ce0 to cb5fee9 Compare November 27, 2023 16:54

upsj force-pushed the hip_memory_order branch from cb5fee9 to c0027e6 Compare November 28, 2023 13:08

upsj added the 1:ST:no-changelog-entry Skip the wiki check for changelog update label Nov 28, 2023

upsj force-pushed the hip_memory_order branch from b989041 to 0fef39f Compare November 28, 2023 23:13

upsj requested a review from yhmtsai November 29, 2023 07:18

yhmtsai reviewed Nov 29, 2023

View reviewed changes

MarcelKoch requested changes Nov 29, 2023

View reviewed changes

hip/components/memory.hip.hpp Outdated Show resolved Hide resolved

hip/components/memory.hip.hpp Outdated Show resolved Hide resolved

cuda/components/memory.cuh Show resolved Hide resolved

upsj added 4 commits November 29, 2023 22:44

use gcc atomic intrinsics for HIP

8ac37ac

use hip atomic intrinsics

8721fcc

fall back to gcc intrinsics

a71b941

add documentation to HIP atomic operations

39edfb8

upsj force-pushed the hip_memory_order branch from 0fef39f to 39edfb8 Compare November 29, 2023 21:44

upsj requested review from MarcelKoch and yhmtsai November 29, 2023 21:44

upsj added the 1:ST:run-full-test label Nov 30, 2023

remove unnecessary SFINAE

0bd702c

yhmtsai approved these changes Nov 30, 2023

View reviewed changes

MarcelKoch approved these changes Nov 30, 2023

View reviewed changes

upsj added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review 1:ST:run-full-test labels Nov 30, 2023

upsj merged commit f2e0449 into develop Nov 30, 2023
12 of 13 checks passed

upsj deleted the hip_memory_order branch November 30, 2023 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `volatile` by proper memory ordering in HIP #1472

Replace `volatile` by proper memory ordering in HIP #1472

upsj commented Nov 25, 2023 •

edited

Loading

yhmtsai left a comment

upsj commented Nov 27, 2023

upsj commented Nov 27, 2023

upsj commented Nov 28, 2023

yhmtsai left a comment

yhmtsai Nov 29, 2023 •

edited by upsj

Loading

upsj Nov 29, 2023 •

edited

Loading

yhmtsai Nov 29, 2023

upsj Nov 29, 2023

MarcelKoch left a comment

upsj commented Nov 29, 2023

sonarqubecloud bot commented Nov 30, 2023

		@@ -6,6 +6,7 @@
		#define GKO_HIP_COMPONENTS_MEMORY_HIP_HPP_


		#include <cstring>

Replace volatile by proper memory ordering in HIP #1472

Replace volatile by proper memory ordering in HIP #1472

Conversation

upsj commented Nov 25, 2023 • edited Loading

yhmtsai left a comment

Choose a reason for hiding this comment

upsj commented Nov 27, 2023

upsj commented Nov 27, 2023

upsj commented Nov 28, 2023

yhmtsai left a comment

Choose a reason for hiding this comment

yhmtsai Nov 29, 2023 • edited by upsj Loading

Choose a reason for hiding this comment

upsj Nov 29, 2023 • edited Loading

Choose a reason for hiding this comment

yhmtsai Nov 29, 2023

Choose a reason for hiding this comment

upsj Nov 29, 2023

Choose a reason for hiding this comment

MarcelKoch left a comment

Choose a reason for hiding this comment

upsj commented Nov 29, 2023

sonarqubecloud bot commented Nov 30, 2023

Replace `volatile` by proper memory ordering in HIP #1472

Replace `volatile` by proper memory ordering in HIP #1472

upsj commented Nov 25, 2023 •

edited

Loading

yhmtsai Nov 29, 2023 •

edited by upsj

Loading

upsj Nov 29, 2023 •

edited

Loading