[numbers] perfmon/pfmon overhead of 17%-94%
From: Ingo Molnar
Date: Sat Jun 27 2009 - 02:44:33 EST
* Ingo Molnar <mingo@xxxxxxx> wrote:
> Besides, you compare perfcounters to perfmon (which you seem to be
> a contributor of), while in reality perfmon has much, much worse
> (and unfixable, because designed-in) measurement overhead.
>
> So why are you criticising perfcounters for a 5000 cycles
> measurement overhead while perfmon has huge, _hundreds of
> millions_ of cycles measurement overhead (per second) for various
> realistic workloads? [ In fact in one of the scheduler-tests
> perfmon has a whopping measurement overhead of _nine billion_
> cycles, it increased total runtime of the workload from 3.3
> seconds to 6.6 seconds. (!) ]
Here are the more detailed perfmon/pfmon measurement overhead
numbers.
Test system is a "Intel Core2 E6800 @ 2.93GHz", 1 GB of RAM, default
Fedora install.
I've measured two workloads:
hackbench.c # messaging server benchmark
test-1m-pipes.c # does 1 million pipe ops, similar to lat_pipe
v2.6.28+perfmon patches (v3, full):
./hackbench 10
0.496400985 seconds time elapsed ( +- 1.699% )
pfmon --follow-fork--aggregate-results ./hackbench 10
0.580812999 seconds time elapsed ( +- 2.233% )
I.e. this workload runs 17% slower under pfmon, the measurement
overhead is about 1.45 billion cycles.
Furthermore, when running a 'pipe latency benchmark', an app that
does one million pipe reads and writes between two tasks (source
code attached below), i measured the following perfmon/pfmon
overhead:
./pipe-test-1m
3.344280347 seconds time elapsed ( +- 0.361% )
pfmon --follow-fork --aggregate-results ./pipe-test-1m
6.508737983 seconds time elapsed ( +- 0.243% )
That's an about 94% measurement overhead, or about 9.2 _billion_
cycles overhead on this test-system.
These perfmon/pfmon overhead figures are consistently reproducible,
and they happen on other test-systems as well, and with other
workloads as well. Basically for any app that involves task creation
or context-switching, perfmon adds considerable runtime overhead -
well beyond the overhead of perfcounters.
Ingo
-----------------{ pipe-test-1m.c }-------------------->
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/wait.h>
#include <linux/unistd.h>
#define LOOPS 1000000
int main (void)
{
unsigned long long t0, t1;
int pipe_1[2], pipe_2[2];
int m = 0, i;
pipe(pipe_1);
pipe(pipe_2);
if (!fork()) {
for (i = 0; i < LOOPS; i++) {
read(pipe_1[0], &m, sizeof(int));
write(pipe_2[1], &m, sizeof(int));
}
} else {
for (i = 0; i < LOOPS; i++) {
write(pipe_1[1], &m, sizeof(int));
read(pipe_2[0], &m, sizeof(int));
}
}
return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/