Skip to content

TestAccuracy

Thomas Roehl edited this page Jun 24, 2015 · 12 revisions

Test the accuracy of some derived metrics

Introduction

Data from hardware performance counters seem to offer a complete, valid and reliable view of the operations done at hardware level but is the data really complete, valid and reliable? The LIKWID team uses hardware counters for a quite some time and we have seen events over- or undercounting as well as many accurate ones.

Benchmark applications

In order to compare the measured data with calculated ones, an application is needed that has the following features:

  • Parseable output of the interested metric
  • Known instruction stream (to specify valid scaling factor)
  • Easy to instrument using LIKWID's Marker API

An application that offers all the above points is likwid-bench, because for the assembly benchmarks we can calculate exactly the performed floating-point operations and the consumed data. Moreover, likwid-bench can be easily instrumented with the Likwid Marker API. Nevertheless, likwid-bench currently offers only streaming benchmarks, hence not all interested metrics can be covered like branch prediction or energy consumption.

Accuracy test tool

The accuracy tool included into the LIKWID suite is written in Python and compares the calculated metric results of likwid-bench with the measured and derived ones of likwid-perfctr. At first it runs the application without instrumentation and after that the instrumented one using the same data size and CPU cores. We could only use the instrumented version but we want to reduce the influence of LIKWID's Marker API to the results.

The accuracy tool can be found in the LIKWID sources in the folder test/accuracy and all following paths are relative to this one. The sequence of test runs is defined by the files in the TESTS folder. An example definition looks like this:

REGEX_BENCH MByte\/s:\s+([0-9]+)
REGEX_PERF \|\s+L2 bandwidth \[MBytes\/s\]\s+\|\s+([0-9\.e\+\-]+)

TEST load
RUNS 5
WA_FACTOR 1.0
VARIANT 12kB 20000
VARIANT 1MB 10000
VARIANT  4MB 7500
VARIANT  1GB 50

TEST xxx
[...]

The REGEX_BENCH is used to parse the data from likwid-bench and REGEX_PERF for the output of likwid-perfctr. After an empty line, the test blocks can be listed. The string after TEST defines the benchmark kernel used for likwid-bench. How often each data size should be tested can be defined at RUNS. The WA_FACTOR is required to scale the output of likwid-bench in order to correct the results to take write-allocate traffic into account. Finally, there are multiple lines with VARIANT size iterations. It is recommended to use selected sizes to see the influence of the CPU caches. The iteration defintion is not needed anymore, because starting with version 4.0.0 of LIKWID, likwid-bench determines a suitable iteration count itself to output reliable results.

Which tests should be performed can be defined in the file SETS.txt, each line specifying one test file without the suffix .txt, or it can be set on command line using the -s/--sets ≤comma-separated list≥ option.

The tool has some command line options to display the comparison:

Option Comment
--grace Write an input file for Xmgrace (PNG)
--gnuplot Write an imput file for gnuplot (JPG)
--pgf Write an input file for PGFPlots (PDF)
--script Write a script to results directory creating all images
--scriptname Specify the filename for the script file

The results of an accuracy run are stored at RESULTS/≤hostname≥. The output of all runs are stored in the .raw files. The input files for plotting are named .dat, where the plain files are the results of likwid-bench, the marker files for likwid-perfctr and the correct files are the scaled results in the plain files using the WA_FACTOR.

Depending on the command line options, there are also .plot files for gnuplot, .agr files for Xmgrace and .tex for PGFPlots. In order to allow all plotting tools simultaneously, each tool uses another output format, noted in the above table. finally, the script file to create all images is there. The default filename for the script is create_plots.sh. Each plotting backend provides more or less details of the tests. Gnuplot is e.g. the only backend plotting the data sizes at the x-axis.

Running accuracy tests

At first, the both versions of likwid-bench must be compiled. This can be done easily by calling

make

in the base folder (test/accuracy) of the accuracy test tool. It compiles the likwid-bench with and without instrumentation and copies the executables to the current folder. The accuracy tool uses the likwid-perfctr executable in the current source tree, not the maybe locally installed one, hence the path to the access daemon must be set in config.mk before running make in the accuracy tool folder.

After setting up the executables, start the test runs

./likwid-accuracy --gnuplot --script

It prints a * for each run. In the end, got to the results folder and create the plots.

cd RESULTS/$(hostname -s)
./create_plots.sh

Tested microarchitectures

Since we are working in a computing center, we have a wide range of microarchitectures inhouse. We tested the accuracy of most architectures. Here is a list of all tested architectures with a link to their accuracy results.

Problems and ideas

In the moment, the accuracy tool is fixed to single-threaded likwid-bench, it would be nice to allow different benchmark applications and to use multiple threads. Moreover, other hardware performance counter tools could be integrated like PAPI or perf_event to see whether they do a more accurate job. There are already some parts that are extended to use PAPI but there is no PAPI integration in likwid-bench.

Clone this wiki locally