* Ingo Molnar <mingo@xxxxxxx> wrote:
Times I believe are in nanoseconds for lmbench, anyway lower is better.Ouch, that looks unacceptably expensive. All the major distros turn CONFIG_PARAVIRT on. paravirt_ops was introduced in x86 with the express promise to have no measurable runtime overhead.
non pv AVG=464.22 STD=5.56
paravirt AVG=502.87 STD=7.36
Nearly 10% performance drop here, which is quite a bit... hopefully people are testing the speed of their PV implementations against non-PV bare metal :)
Here are some more precise stats done via hw counters on a perfcounters kernel using 'timec', running a modified version of the 'mmap performance stress-test' app i made years ago.
The MM benchmark app can be downloaded from:
http://redhat.com/~mingo/misc/mmap-perf.c
timec.c can be picked up from:
http://redhat.com/~mingo/perfcounters/timec.c
mmap-perf conducts 1 million mmap()/munmap()/mremap() calls, and touches the mapped area as well with a certain chance. The patterns are pseudo-random and the random seed is initialized to the same value so repeated runs produce the exact same mmap sequence.
I ran the test with a single thread and bound to a single core:
# taskset 2 timec -e -5,-4,-3,0,1,2,3 ./mmap-perf 1
[ I ran it as root - so that kernel-space hardware-counter statistics are included as well. ]
The results are quite surprisingly candid about the true costs of paravirt_ops on the native kernel's overhead (CONFIG_PARAVIRT=y):
-----------------------------------------------
| Performance counter stats for './mmap-perf' |
-----------------------------------------------
| |
| x86-defconfig | PARAVIRT=y |------------------------------------------------------------------
|
| 1311.554526 | 1360.624932 task clock ticks (msecs) +3.74%
| |
| 1 | 1 CPU migrations
| 91 | 79 context switches
| 55945 | 55943 pagefaults
| ............................................
| 3781392474 | 3918777174 CPU cycles +3.63%
| 1957153827 | 2161280486 instructions +10.43%
| 50234816 | 51303520 cache references +2.12%
| 5428258 | 5583728 cache misses +2.86%
| |
| 1314.782469 | 1363.694447 time elapsed (msecs) +3.72%
| |
-----------------------------------
The most surprising element is that in the paravirt_ops case we run 204 million more instructions - out of the ~2000 million instructions total.
That's an increase of over 10%!