Re: [PATCH v3 2/8] lockdep: Introduce CROSSRELEASE_STACK_TRACE and make it not unwind as default

From: Ingo Molnar
Date: Tue Oct 24 2017 - 06:06:23 EST



Cannot pick up this series yet, but I have enhanced the changelog to:

=============>
Subject: locking/lockdep: Introduce CONFIG_CROSSRELEASE_STACK_TRACE and make it not unwind by default
From: Byungchul Park <byungchul.park@xxxxxxx>
Date: Tue, 24 Oct 2017 18:38:03 +0900

Johan Hovold reported a heavy performance regression caused by
lockdep cross-release:

> Boot time (from "Linux version" to login prompt) had in fact doubled
> since 4.13 where it took 17 seconds (with my current config) compared to
> the 35 seconds I now see with 4.14-rc4.
>
> I quick bisect pointed to lockdep and specifically the following commit:
>
> 28a903f63ec0 ("locking/lockdep: Handle non(or multi)-acquisition
> of a crosslock")
>
> which I've verified is the commit which doubled the boot time (compared
> to 28a903f63ec0^) (added by lockdep crossrelease series [1]).

Currently crossrelease performs unwind on every acquisition, but that is
very expensive.

This patch makes unwind optional and disables it by default and only records
acquire_ip.

Full stack traces are sometimes required for full analysis, in which
case CROSSRELEASE_STACK_TRACE can be enabled.

On my qemu Ubuntu machine (x86_64, 4 cores, 512M), the regression was
fixed. We measure boot times with 'perf stat --null --repeat 10 $QEMU',
where $QEMU launches a kernel with init=/bin/true:

1. No lockdep enabled:

Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):

2.756558155 seconds time elapsed ( +- 0.09% )

2. Lockdep enabled:

Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):

2.968710420 seconds time elapsed ( +- 0.12% )

3. Lockdep enabled + crossrelease enabled:

Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):

3.153839636 seconds time elapsed ( +- 0.31% )

4. Lockdep enabled + crossrelease enabled + this patch applied:

Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):

2.963669551 seconds time elapsed ( +- 0.11% )

I.e. lockdep-crossrelease performance is now indistinguishable
from vanilla lockdep.