-
Notifications
You must be signed in to change notification settings - Fork 2
Threadsafe Stack Tracking
This section assumes that the problems with OpenMP (see OpenMP-Callbacks#Known Bugs in OpenMP) will be resolved by the compiler developers at some point.
In Call Tree Representation it is shown that keeping track of a
"current stack" enables stack dependent profiling.
This is fine if the process is executed with one thread only.
As soon as a process is executed with multiple threads a
"current stack" does not make sense anymore.
In the following code thread 0 calls foo
, while thread 1 calls bar
.
int foo() {return 1;}
int bar() {return 1;}
int main(int argc, char **argv) {
#pragma omp parallel num_threads(2)
{
if (omp_get_thread_num()%2 == 0) {
foo();
} else {
bar();
}
}
return 0;
}
This makes it necessary to keep track of a "current stack" for every thread. It gets even more complicated with leveled thread structures like in the following code:
int foo() {
#pragma omp parallel num_threads(2)
{
}
return 1;
}
int bar() {return 1;}
int main(int argc, char **argv) {
#pragma omp parallel num_threads(4)
{
if (omp_get_thread_num()%2 == 0) {
foo();
} else {
bar();
}
}
return 0;
}
The function foo
, which was called by all even numbered threads, spawn two threads itself.
This requires to mirror the leveled thread structure in the "current stack" structure.
This is exactly what is done in Vftrace.
Within OpenMP a thread can get its level of thread nesting with omp_get_level();
,
its thread number on the highest level with omp_get_thread_num()
,
and the thread number of its parents, grandparents, ... with omp_get_ancestor_thread_num(level)
.
This allows for a thread to know where on the thread tree it is at any point in time.
Therefore Vftrace can use the same information to construct its own thread tree,
where each thread carries information about the "current stack".
typedef struct {
int level;
int thread_num;
bool master;
threadstacklist_t stacklist;
int parent_thread;
int threadID;
int maxsubthreads;
int nsubthreads;
int *subthreads;
} thread_t;
typedef struct {
int nthreads;
int maxthreads;
thread_t *threads;
} threadtree_t;
Similarly to the Call Tree Representation the threadtree is represented
in a resizable array where every thread has an ID which is identical to its position in the array.
Thus the parent_thread
, and a list of subthreads
can be represented by integer indices.
Keeping track of the "current stack" is done for each thread in the stacklist.
typedef struct {
int stackID;
int recursion_depth;
} threadstack_t;
typedef struct {
int nstacks;
int maxstacks;
threadstack_t *stacks;
} threadstacklist_t;
The stacklist basically is resizable list of stackIDs, where a stackID is pushed onto, when a function entry hook is called, and popped of if a function exit hook is called.