Re: [tip:perfcounters/core] perf_counter: x86: Fix call-chainsupport to use NMI-safe methods

From: Ingo Molnar
Date: Mon Jun 15 2009 - 15:44:39 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> > If it's faster, this becomes a legit (albeit complex)
> > micro-optimization in a _very_ hot codepath.
>
> I don't think it's all that hot. It's not like it's the return to
> user mode.

Well i guess it depends. For server apps it is true - syscalls are a
lot more dominant, MMs are long-running so any startup cost gets
amortized and pagefaults are avoided.

For something like a kernel build we have 7 times as many pagefaults
as syscalls:

aldebaran:~/linux/linux> perf stat -- make -j32 >/dev/null
[...]
Performance counter stats for 'make -j32':

1444281.076741 task-clock-msecs # 14.429 CPUs
219991 context-switches # 0.000 M/sec
18335 CPU-migrations # 0.000 M/sec
38465628 page-faults # 0.027 M/sec
4374762924204 cycles # 3029.025 M/sec
2645979309823 instructions # 0.605 IPC
42398991227 cache-references # 29.356 M/sec
4371920878 cache-misses # 3.027 M/sec

100.097787566 seconds time elapsed.

So we have 38465628 page-faults, or one every 68788 instructions,
one every 113731 cycles.

10 cycles saved in the page fault costs means 0.01% performance win
- or about 10 milliseconds shaven off the kernel build time.

100 cycles saved (which is impossible really in the entry/exit path)
would mean 0.1% win.

5653639 syscalls (according to strace -c) - which is a factor of 6.8
lower. Same goes for shell scripts or most of the clicking we do on
a GUI.

It's not a big factor for sure.

Btw., the biggest pagefault cost is in the fault handling itself
(the page clearing):

4.14% [k] do_page_fault
1.20% [k] sys_write
1.10% [k] sys_open
0.63% [k] sys_exit_group
0.48% [k] smp_apic_timer_interrupt
0.37% [k] sys_read
0.37% [k] sys_execve
0.20% [k] sys_mmap
0.18% [k] sys_close
0.14% [k] sys_munmap
0.13% [k] sys_poll
0.09% [k] sys_newstat
0.07% [k] sys_clone
0.06% [k] sys_newfstat

it totals to 4.14% of the total cost (user-space cycles included) of
a kernel build, on a Nehalem box.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/