* Jie Chen <chen@xxxxxxxx> wrote:
and then you use this in the measurement loop:Hi, Ingo:
for (k=0; k<=OUTERREPS; k++){
start = getclock();
for (j=0; j<innerreps; j++){
#ifdef _QMT_PUBLIC
delay((void *)0, 0);
#else
delay(0, 0, 0, (void *)0);
#endif
}
times[k] = (getclock() - start) * 1.0e6 / (double) innerreps;
}
the problem is, this does not take the overhead of gettimeofday into account - which overhead can easily reach 10 usecs (the observed regression). Could you try to eliminate the gettimeofday overhead from your measurement?
gettimeofday overhead is something that might have changed from .21 to .22 on your box.
Ingo
In my pthread_sync code, I first call refer () subroutine which actually establishes the elapsed time (reference time) for non-synchronized delay() using the gettimeofday. Then each synchronization overhead value is obtained by subtracting the reference time from the elapsed time with introduced synchronization. The effect of gettimeofday() should be minimal if the time difference (overhead value) is the interest here. Unless the gettimeofday behaves differently in the case of running 8 threads .vs. running 2 threads.
I will try to replace gettimeofday with a lightweight timer call in my test code. Thank you very much.
gettimeofday overhead is around 10 usecs here:
2740 1197359374.873214 gettimeofday({1197359374, 873225}, NULL) = 0 <0.000010>
2740 1197359374.970592 gettimeofday({1197359374, 970608}, NULL) = 0 <0.000010>
and that's the only thing that is going on when computing the reference time - and i see a similar syscall pattern in the PARALLEL and BARRIER calculations as well (with no real scheduling going on).
Ingo