Skip to content

OpenMP Callbacks

Felix Uhl edited this page Nov 23, 2022 · 4 revisions

OpenMP tracing could (in principle) be activated by adding --enable-openmp to the configure flags (See Download and Compile#Configure Build Directory). However, currently there is no implementation of a callbacksystem that is suitable for proper use in vftrace (See OpenMP Callbacks#Known Bugs in OpenMP).

OMPT Callback System

Since OMP-5.0 the OpenMP-Standard specifies a callback system, where users can specify functions, which are to be called, if certain OMP-events are triggered. The following code will help to illustrate how the callback functionality works. It starts a parallel region that spawns four threads, which are immediately joined again.

// test.c
#include <omp.h>
int main() {
   #pragma omp parallel num_threads(4)
   {
   } 
   return 0;
}

The first step in using the callback system is to implement the function ompt_start_tool. The existence of this function tells omp to initialize the callbacks. In order to use the required type definitions the omp-tools.h header needs to be included. The function needs to return a struct containing pointers to the initialize and finalize routines the user intends to use, and which need to be supplied.

// callbacks.c
#include <stdio.h>
#include <omp.h>
#include <omp-tools.h>
ompt_start_tool_result_t *ompt_start_tool(unsigned int omp_version, const char *runtime_version) {
   static ompt_start_tool_result_t ompt_start_tool_result;
   ompt_start_tool_result.initialize = &initialize_callbacks;
   ompt_start_tool_result.finalize = &finalize_callbacks;
   return &ompt_start_tool_result;
}

The next step is to define the functions initialize_callbacks and finalize_callbacks. initialize_callbacks first needs to lookup the function to register callbacks called ompt_set_callback. Next ompt_set_callback is called for every callback function we want to utilize in our program. Here only the callbacks for the begin and end of a parallel region will be registered. In order to do so we have to supply pointers to the functions (callback_par_begin,callback_par_end) we want to be called when a parallel region begins/ends.

// callbacks.c
int initialize_callbacks(ompt_function_lookup_t lookup, int initial_device_num, ompt_data_t *tool_data) {
   // Get the set_callback function pointer
   ompt_set_callback_t ompt_set_callback = (ompt_set_callback_t)lookup("ompt_set_callback");
   // register the available callback functions
   ompt_set_callback(ompt_callback_parallel_begin, (ompt_callback_t)(&callback_par_begin));
   ompt_set_callback(ompt_callback_parallel_end, (ompt_callback_t)(&callback_par_end));
   return 1;
}

The finalize_callbacks function serves no special usefull purpose and is left empty.

void finalize_callbacks(ompt_data_t *tool_data) {
   (void) tool_data;
}

The last step is to supply the definitions of the callback functions we just registered. Here the callback functions will only print some information about the event.

static void callback_par_begin(ompt_data_t *encountering_task_data, const ompt_frame_t *encountering_task_frame, ompt_data_t *parallel_data, unsigned int requested_parallelism, int flags, const void *codeptr_ra) {
   fprintf(stderr, "Callback parallel begin called by thread %d on level %d with %d threads\n", omp_get_thread_num(), omp_get_level(), requested_parallelism);
}

static void callback_par_end(ompt_data_t *parallel_data, ompt_data_t *encountering_task_data, int flags, const void *codeptr_ra) {
   fprintf(stderr, "Callback parallel end   called by thread %d on level %d\n", omp_get_thread_num(), omp_get_level());
}

Compiling (currently clang is the only compiler where the callback system is sufficient to work at least with these simple tests) and executing the resulting binary will result in:

$ clang -fopenmp -c callbacks.c
$ clang -fopenmp -c test.c
$ clang -fopenmp -o test.x test.o callbacks.o
$ ./test.x
Callback parallel begin called by thread 0 on level 0 with 4 threads
Callback parallel end   called by thread 0 on level 0

For a complete list of available callback functions and their arguments the OpenMP-standard should be consulted.

OpenMP in Vftrace

Properly Linking OMPT

If the ompt_start_tool is supplied in a library like Vftrace the linker will not include it in the final binary, because it is never called, thus deemed unnecessary. To trick the linker into including it anyways ompt_start_tool is called in the initialization phase of Vftrace, with dummy arguments that have the function return immediately.

void vftr_initialize(void *func, void *call_site) {
...
      // trick the linker into including extra symbols
#ifdef _OMP
      // omp callback symbols
      (void) ompt_start_tool(0, NULL);
#endif
...
}
ompt_start_tool_result_t *ompt_start_tool(unsigned int omp_version, const char *runtime_version) {
   // return from dummy calls that are only done to trick the linker
   // to link all of the OMP-callback layers into the executable
   if (omp_version == 0 && runtime_version == NULL) {return NULL;}
...
}

Because there is no fully functional callback system for OMP available in any compiler it is not clear which callbacks could best be used for omp tracing. The only callback that is implemented in Vftrace and (almost) works ist the parallel_begin/end callback. Those callbacks start/end a region, that shows up in the profile table.

static void vftr_ompt_callback_parallel_begin(ompt_data_t *encountering_task_data, const ompt_frame_t *encountering_task_frame, ompt_data_t *parallel_data, unsigned int requested_parallelism, int flags, const void *codeptr_ra) {
   vftr_omp_region_begin("omp_parallel_region", codeptr_ra);
}
static void vftr_ompt_callback_parallel_end(ompt_data_t *parallel_data, ompt_data_t *encountering_task_data, int flags, const void *codeptr_ra) {
   vftr_omp_region_end();
}

The implicit_task callback is probably a good choice to collect more detailed information about the individual thread loads, but they are broken.

Known Bugs in OpenMP

Currently llvm, and intel are the only compilers, which support OMP-5.x and a callback interface. Here is a collection of interesting issues, rendering the callback layer useless for now, and prohibiting vftrace from tracing omp properly:

  • Threads do leave the sync_region when the parallel region ends, but when the next region begins: (Intel Forum)
  • Implicit_task callback functions segfault if the threadlevel is requested via omp_get_level(): (Github issue)
  • The scope_end callbacks are not issues at the correct point in time, thus violating the standard: (Github issue)
  • Under specific circumstances the thread closing a parallel region is not the one requested by the standard: (Github issue)
Clone this wiki locally