Re: [tip:perfcounters/core] perf_counter: x86: Fix call-chainsupport to use NMI-safe methods

From: Ingo Molnar
Date: Fri Jun 19 2009 - 11:21:15 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Mon, 15 Jun 2009, Ingo Molnar wrote:
> >
> > See the numbers in the other mail: about 33 million pagefaults
> > happen in a typical kernel build - that's ~400K/sec - and that
> > is not a particularly really pagefault-heavy workload.
>
> Did you do any function-level profiles?
>
> Last I looked at it, the real cost of page faults were all in the
> memory copies and page clearing, and while it would be nice to
> speed up the kernel entry and exit, the few tens of cycles we
> might be able to get from there really aren't all that important.

Yeah.

Here's the function level profiles of a typical kernel build on a
Nehalem box:

$ perf report --sort symbol

#
# (14317328 samples)
#
# Overhead Symbol
# ........ ......
#
44.05% 0x000000001a0b80
5.09% 0x0000000001d298
3.56% 0x0000000005742c
2.48% 0x0000000014026d
2.31% 0x00000000007b1a
2.06% 0x00000000115ac9
1.83% [.] _int_malloc
1.71% 0x00000000064680
1.50% [.] memset
1.37% 0x00000000125d88
1.28% 0x000000000b7642
1.17% [k] clear_page_c
0.87% [k] page_fault
0.78% [.] is_defined_config
0.71% [.] _int_free
0.68% [.] __GI_strlen
0.66% 0x000000000699e8
0.54% [.] __GI_memcpy

Most is dominated by user-space symbols. (no proper ELF+debuginfo on
this box so they are unnamed.) It also sows that page clearing and
pagefault handling dominates the kernel overhead - but is dwarved by
other overhead. Any page-fault-entry costs are a drop in the bucket.

In fact with call-chain graphs we can get a precise picture, as we
can do a non-linear 'slice' set operation over the samples and
filter out the ones that have the 'page_fault' pattern in one of
their parent functions:

$ perf report --sort symbol --parent page_fault

#
# (14317328 samples)
#
# Overhead Symbol
# ........ ......
#
1.12% [k] clear_page_c
0.87% [k] page_fault
0.43% [k] get_page_from_freelist
0.25% [k] _spin_lock
0.24% [k] do_page_fault
0.23% [k] perf_swcounter_ctx_event
0.16% [k] perf_swcounter_event
0.15% [k] handle_mm_fault
0.15% [k] __alloc_pages_nodemask
0.14% [k] __rmqueue
0.12% [k] find_get_page
0.11% [k] copy_page_c
0.11% [k] find_vma
0.10% [k] _spin_lock_irqsave
0.10% [k] __wake_up_bit
0.09% [k] _spin_unlock_irqrestore
0.09% [k] do_anonymous_page
0.09% [k] __inc_zone_state

This "sub-profile" shows the true summary overhead that 'page_fault'
and all its child functions have. Note that for example clear_page_c
decreased from 1.17% to 1.12%:

1.12% [k] clear_page_c
1.17% [k] clear_page_c

because there's 0.05% of other callers to clear_page_c() that do not
involve page_fault. Those are filtered out via --parent
filtering/matching.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/