Skip to content

Commit

Permalink
Merge branch 'rocm_sdk_update' of github.com:RRZE-HPC/likwid into roc…
Browse files Browse the repository at this point in the history
…m_sdk_update
  • Loading branch information
TomTheBear committed Oct 31, 2024
2 parents 5a1de25 + abc8001 commit 741c2ad
Show file tree
Hide file tree
Showing 23 changed files with 176 additions and 6 deletions.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
15 changes: 15 additions & 0 deletions groups/amd_gpu_v1/GDS.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
SHORT GDS Instructions

EVENTSET
ROCM0 ROCP_SQ_INSTS_GDS
ROCM1 ROCP_SQ_WAVES

METRICS
GPU GDS rw insts per work-item ROCM0/ROCM1

LONG
Formulas:
GPU GDS rw insts per work-item = ROCP_SQ_INSTS_GDS/ROCP_SQ_WAVES
--
The average number of GDS read or GDS write instructions executed
per work item (affected by flow control).
18 changes: 18 additions & 0 deletions groups/amd_gpu_v1/MEM.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
SHORT Memory utilization

EVENTSET
ROCM0 ROCP_TA_TA_BUSY
ROCM1 ROCP_GRBM_GUI_ACTIVE
ROCM2 ROCP_SE_NUM

METRICS
GPU memory utilization 100*max(ROCM0,16)/ROCM1/ROCM2

LONG
Formulas:
GPU memory utilization = 100*max(ROCP_TA_TA_BUSY,16)/ROCP_GRBM_GUI_ACTIVE/ROCP_SE_NUM
--
The percentage of GPUTime the memory unit is active. The result includes
the stall time (MemUnitStalled). This is measured with all extra fetches
and writes and any cache or memory effects taken into account.
Value range: 0% to 100% (fetch-bound).
23 changes: 23 additions & 0 deletions groups/amd_gpu_v1/PCI.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
SHORT PCI Transfers

EVENTSET
ROCM0 RSMI_PCI_THROUGHPUT_SENT
ROCM1 RSMI_PCI_THROUGHPUT_RECEIVED


METRICS
Runtime time
PCI sent ROCM0
PCI received ROCM1
PCI send bandwidth 1E-6*ROCM0/time
PCI recv bandwidth 1E-6*ROCM1/time

LONG
Formulas:
PCI sent = RSMI_PCI_THROUGHPUT_SENT
PCI received = RSMI_PCI_THROUGHPUT_RECEIVED
PCI send bandwidth = 1E-6*RSMI_PCI_THROUGHPUT_SENT/runtime
PCI recv bandwidth = 1E-6*RSMI_PCI_THROUGHPUT_RECEIVED/runtime
--
Currently not usable since the RSMI_PCI_THROUGHPUT_* events require
one second per call, so 2 seconds for both of them.
21 changes: 21 additions & 0 deletions groups/amd_gpu_v1/POWER.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
SHORT Power, temperature and voltage

EVENTSET
ROCM0 RSMI_POWER_AVE[0]
ROCM1 RSMI_TEMP_EDGE
ROCM2 RSMI_VOLT_VDDGFX


METRICS
Power average 1E-6*ROCM0
Edge temperature 1E-3*ROCM1
Voltage 1E-3*ROCM2

LONG
Formulas:
Power average = RSMI_POWER_AVE[0]
Edge temperature = 1E-3*RSMI_TEMP_EDGE
Voltage = 1E-3*RSMI_VOLT_VDDGFX
--
Gets the current average power consumption in watts, the
temperature in celsius and the voltage in volts.
15 changes: 15 additions & 0 deletions groups/amd_gpu_v1/SALU.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
SHORT SALU Instructions

EVENTSET
ROCM0 ROCP_SQ_INSTS_SALU
ROCM1 ROCP_SQ_WAVES

METRICS
GPU SALU insts per work-item ROCM0/ROCM1

LONG
Formulas:
GPU SALU insts per work-item = ROCP_SQ_INSTS_SALU/ROCP_SQ_WAVES
--
The average number of scalar ALU instructions executed per work-item
(affected by flow control).
15 changes: 15 additions & 0 deletions groups/amd_gpu_v1/SFETCH.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
SHORT SFetch Instructions

EVENTSET
ROCM0 ROCP_SQ_INSTS_SMEM
ROCM1 ROCP_SQ_WAVES

METRICS
GPU SFETCH insts per work-item ROCM0/ROCM1

LONG
Formulas:
GPU SFETCH insts per work-item = ROCP_SQ_INSTS_SMEM/ROCP_SQ_WAVES
--
The average number of scalar fetch instructions from the video memory
executed per work-item (affected by flow control).
19 changes: 19 additions & 0 deletions groups/amd_gpu_v1/STALLED.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
SHORT ALU stalled by LDS

EVENTSET
ROCM0 ROCP_SQ_WAIT_INST_LDS
ROCM1 ROCP_SQ_WAVES
ROCM2 ROCP_GRBM_GUI_ACTIVE

METRICS
GPU ALD stalled 100*ROCM0*4/ROCM1/ROCM2

LONG
Formulas:
GPU ALD stalled = 100*ROCP_SQ_WAIT_INST_LDS*4/ROCP_SQ_WAVES/ROCP_GRBM_GUI_ACTIVE
--
The percentage of GPUTime ALU units are stalled by the LDS input queue
being full or the output queue being not ready. If there are LDS bank
conflicts, reduce them. Otherwise, try reducing the number of LDS
accesses if possible.
Value range: 0% (optimal) to 100% (bad).
18 changes: 18 additions & 0 deletions groups/amd_gpu_v1/UTIL.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
SHORT GPU utilization

EVENTSET
ROCM0 ROCP_GRBM_COUNT
ROCM1 ROCP_GRBM_GUI_ACTIVE


METRICS
GPU utilization 100*ROCM1/ROCM0


LONG
Formulas:
GPU utilization = 100*ROCP_GRBM_GUI_ACTIVE/ROCP_GRBM_COUNT
--
This group reassembles the 'GPUBusy' metric provided by RocProfiler.
We should add, that we can select the GPUBusy metric directly and the
calculations are done internally in case the metric formula changes.
15 changes: 15 additions & 0 deletions groups/amd_gpu_v1/VALU.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
SHORT VALU Instructions

EVENTSET
ROCM0 ROCP_SQ_INSTS_VALU
ROCM1 ROCP_SQ_WAVES

METRICS
GPU VALU insts per work-item ROCM0/ROCM1

LONG
Formulas:
GPU VALU insts per work-item = ROCP_SQ_INSTS_VALU/ROCP_SQ_WAVES
--
The average number of vector ALU instructions executed per work-item
(affected by flow control).
15 changes: 15 additions & 0 deletions groups/amd_gpu_v1/WAVE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
SHORT Wavefronts

EVENTSET
ROCM0 ROCP_SQ_WAVES


METRICS
GPU wavefronts ROCM0


LONG
Formulas:
GPU wavefronts = ROCP_SQ_WAVES
--
Total Wavefronts
4 changes: 0 additions & 4 deletions src/includes/rocmon_sdk.h
Original file line number Diff line number Diff line change
Expand Up @@ -397,7 +397,6 @@ _rocmon_sdk_read_buffers(rocprofiler_context_id_t device_context,
if(h->category == ROCPROFILER_BUFFER_CATEGORY_COUNTERS && h->kind == ROCPROFILER_COUNTER_RECORD_VALUE)
{
rocprofiler_record_counter_t* r = h->payload;
printf("Counter ID %d Value %f Dispatch %ld\n", r->id, r->counter_value, r->dispatch_id);
rocprofiler_counter_id_t cid = {.handle = 0};
(*rocprofiler_query_record_counter_id_ptr)(r->id, &cid);
for (int j = 0; j < context->numDevices; j++)
Expand Down Expand Up @@ -619,8 +618,6 @@ _rocmon_sdk_set_profile(rocprofiler_context_id_t context_id,





rocprofiler_tool_configure_result_t*
rocprofiler_configure(uint32_t version,
const char* runtime_version,
Expand Down Expand Up @@ -658,7 +655,6 @@ rocmon_sdk_init(RocmonContext* context, int numGpus, const int* gpuIds)
}
if (rocmon_sdk_initialized)
{

return 0;
}

Expand Down
2 changes: 1 addition & 1 deletion src/includes/rocmon_smi.h
Original file line number Diff line number Diff line change
Expand Up @@ -1156,7 +1156,7 @@ void rocmon_smi_finalize(RocmonContext* context)
}
ROCMON_DEBUG_PRINT(DEBUGLEV_DEVELOP, Shutdown RSMI);
RSMI_CALL(rsmi_shut_down, (), {
ROCMON_DEBUG_PRINT(DEBUGLEV_DEVELOP, Shutdown SMI);
ERROR_PRINT(DEBUGLEV_DEVELOP, Shutdown SMI failed);
// fall through
});
rocmon_smi_initialized = FALSE;
Expand Down
2 changes: 1 addition & 1 deletion src/includes/rocmon_v1.h
Original file line number Diff line number Diff line change
Expand Up @@ -575,7 +575,7 @@ rocmon_v1_finalize(RocmonContext* context)
}

ROCM_CALL(hsa_shut_down, (), {
//ROCMON_DEBUG_PRINT(DEBUGLEV_DEVELOP, Shutdown HSA);
ERROR_PRINT(DEBUGLEV_DEVELOP, Shutdown HSA failed);
// fall through
});
}
Expand Down

0 comments on commit 741c2ad

Please sign in to comment.