Re: [patch 2/3] mutex subsystem: fastpath inlining

From: Ingo Molnar
Date: Thu Dec 29 2005 - 03:40:35 EST



* Nicolas Pitre <nico@xxxxxxx> wrote:

> This is with all mutex patches applied and CONFIG_DEBUG_MUTEX_FULL=n,
> therefore using the current semaphore code:
>
> text data bss dec hex filename
> 1821108 287792 88264 2197164 2186ac vmlinux
>
> Now with CONFIG_DEBUG_MUTEX_FULL=y to substitute semaphores with
> mutexes:
>
> text data bss dec hex filename
> 1797108 287568 88172 2172848 2127b0 vmlinux
>
> Finally with CONFIG_DEBUG_MUTEX_FULL=y and fast paths inlined:
>
> text data bss dec hex filename
> 1807824 287136 88172 2183132 214fdc vmlinux
>
> This last case is not the smallest, but it is the fastest.

i.e. 1.3% text savings from going to mutexes, and inlining them again
gives up 0.5% of that. We've uninlined stuff for a smaller gain in the
past ...

> > Note that x86 went to a non-inlined fastpath _despite_
> > having a compact CISC semaphore fastpath.
>
> The function call overhead on x86 is less significant than the ARM
> one, so always calling out of line code might be sensible in that
> case.

i'm highly doubtful we should do that. The spinlock APIs are 4 times
more frequent than mutexes are ever going to be, still they too are
mostly out of line. (and we only inline the unlock portions that are a
space win!) Can you measure any significant difference in performance?
(e.g. lat_pipe triggers the mutex fastpath, in DEBUG_MUTEX_FULL=y mode)

the performance won by inlining is often offset by the performance cost
of the higher icache footprint. (and ARM CPUs dont have that large
caches to begin with)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/