-
Notifications
You must be signed in to change notification settings - Fork 233
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'rocm_sdk_update' of github.com:RRZE-HPC/likwid into roc…
…m_sdk_update
- Loading branch information
Showing
23 changed files
with
176 additions
and
6 deletions.
There are no files selected for viewing
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
SHORT GDS Instructions | ||
|
||
EVENTSET | ||
ROCM0 ROCP_SQ_INSTS_GDS | ||
ROCM1 ROCP_SQ_WAVES | ||
|
||
METRICS | ||
GPU GDS rw insts per work-item ROCM0/ROCM1 | ||
|
||
LONG | ||
Formulas: | ||
GPU GDS rw insts per work-item = ROCP_SQ_INSTS_GDS/ROCP_SQ_WAVES | ||
-- | ||
The average number of GDS read or GDS write instructions executed | ||
per work item (affected by flow control). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
SHORT Memory utilization | ||
|
||
EVENTSET | ||
ROCM0 ROCP_TA_TA_BUSY | ||
ROCM1 ROCP_GRBM_GUI_ACTIVE | ||
ROCM2 ROCP_SE_NUM | ||
|
||
METRICS | ||
GPU memory utilization 100*max(ROCM0,16)/ROCM1/ROCM2 | ||
|
||
LONG | ||
Formulas: | ||
GPU memory utilization = 100*max(ROCP_TA_TA_BUSY,16)/ROCP_GRBM_GUI_ACTIVE/ROCP_SE_NUM | ||
-- | ||
The percentage of GPUTime the memory unit is active. The result includes | ||
the stall time (MemUnitStalled). This is measured with all extra fetches | ||
and writes and any cache or memory effects taken into account. | ||
Value range: 0% to 100% (fetch-bound). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
SHORT PCI Transfers | ||
|
||
EVENTSET | ||
ROCM0 RSMI_PCI_THROUGHPUT_SENT | ||
ROCM1 RSMI_PCI_THROUGHPUT_RECEIVED | ||
|
||
|
||
METRICS | ||
Runtime time | ||
PCI sent ROCM0 | ||
PCI received ROCM1 | ||
PCI send bandwidth 1E-6*ROCM0/time | ||
PCI recv bandwidth 1E-6*ROCM1/time | ||
|
||
LONG | ||
Formulas: | ||
PCI sent = RSMI_PCI_THROUGHPUT_SENT | ||
PCI received = RSMI_PCI_THROUGHPUT_RECEIVED | ||
PCI send bandwidth = 1E-6*RSMI_PCI_THROUGHPUT_SENT/runtime | ||
PCI recv bandwidth = 1E-6*RSMI_PCI_THROUGHPUT_RECEIVED/runtime | ||
-- | ||
Currently not usable since the RSMI_PCI_THROUGHPUT_* events require | ||
one second per call, so 2 seconds for both of them. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
SHORT Power, temperature and voltage | ||
|
||
EVENTSET | ||
ROCM0 RSMI_POWER_AVE[0] | ||
ROCM1 RSMI_TEMP_EDGE | ||
ROCM2 RSMI_VOLT_VDDGFX | ||
|
||
|
||
METRICS | ||
Power average 1E-6*ROCM0 | ||
Edge temperature 1E-3*ROCM1 | ||
Voltage 1E-3*ROCM2 | ||
|
||
LONG | ||
Formulas: | ||
Power average = RSMI_POWER_AVE[0] | ||
Edge temperature = 1E-3*RSMI_TEMP_EDGE | ||
Voltage = 1E-3*RSMI_VOLT_VDDGFX | ||
-- | ||
Gets the current average power consumption in watts, the | ||
temperature in celsius and the voltage in volts. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
SHORT SALU Instructions | ||
|
||
EVENTSET | ||
ROCM0 ROCP_SQ_INSTS_SALU | ||
ROCM1 ROCP_SQ_WAVES | ||
|
||
METRICS | ||
GPU SALU insts per work-item ROCM0/ROCM1 | ||
|
||
LONG | ||
Formulas: | ||
GPU SALU insts per work-item = ROCP_SQ_INSTS_SALU/ROCP_SQ_WAVES | ||
-- | ||
The average number of scalar ALU instructions executed per work-item | ||
(affected by flow control). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
SHORT SFetch Instructions | ||
|
||
EVENTSET | ||
ROCM0 ROCP_SQ_INSTS_SMEM | ||
ROCM1 ROCP_SQ_WAVES | ||
|
||
METRICS | ||
GPU SFETCH insts per work-item ROCM0/ROCM1 | ||
|
||
LONG | ||
Formulas: | ||
GPU SFETCH insts per work-item = ROCP_SQ_INSTS_SMEM/ROCP_SQ_WAVES | ||
-- | ||
The average number of scalar fetch instructions from the video memory | ||
executed per work-item (affected by flow control). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
SHORT ALU stalled by LDS | ||
|
||
EVENTSET | ||
ROCM0 ROCP_SQ_WAIT_INST_LDS | ||
ROCM1 ROCP_SQ_WAVES | ||
ROCM2 ROCP_GRBM_GUI_ACTIVE | ||
|
||
METRICS | ||
GPU ALD stalled 100*ROCM0*4/ROCM1/ROCM2 | ||
|
||
LONG | ||
Formulas: | ||
GPU ALD stalled = 100*ROCP_SQ_WAIT_INST_LDS*4/ROCP_SQ_WAVES/ROCP_GRBM_GUI_ACTIVE | ||
-- | ||
The percentage of GPUTime ALU units are stalled by the LDS input queue | ||
being full or the output queue being not ready. If there are LDS bank | ||
conflicts, reduce them. Otherwise, try reducing the number of LDS | ||
accesses if possible. | ||
Value range: 0% (optimal) to 100% (bad). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
SHORT GPU utilization | ||
|
||
EVENTSET | ||
ROCM0 ROCP_GRBM_COUNT | ||
ROCM1 ROCP_GRBM_GUI_ACTIVE | ||
|
||
|
||
METRICS | ||
GPU utilization 100*ROCM1/ROCM0 | ||
|
||
|
||
LONG | ||
Formulas: | ||
GPU utilization = 100*ROCP_GRBM_GUI_ACTIVE/ROCP_GRBM_COUNT | ||
-- | ||
This group reassembles the 'GPUBusy' metric provided by RocProfiler. | ||
We should add, that we can select the GPUBusy metric directly and the | ||
calculations are done internally in case the metric formula changes. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
SHORT VALU Instructions | ||
|
||
EVENTSET | ||
ROCM0 ROCP_SQ_INSTS_VALU | ||
ROCM1 ROCP_SQ_WAVES | ||
|
||
METRICS | ||
GPU VALU insts per work-item ROCM0/ROCM1 | ||
|
||
LONG | ||
Formulas: | ||
GPU VALU insts per work-item = ROCP_SQ_INSTS_VALU/ROCP_SQ_WAVES | ||
-- | ||
The average number of vector ALU instructions executed per work-item | ||
(affected by flow control). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
SHORT Wavefronts | ||
|
||
EVENTSET | ||
ROCM0 ROCP_SQ_WAVES | ||
|
||
|
||
METRICS | ||
GPU wavefronts ROCM0 | ||
|
||
|
||
LONG | ||
Formulas: | ||
GPU wavefronts = ROCP_SQ_WAVES | ||
-- | ||
Total Wavefronts |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters