Re: [PATCH v3 2/8] lockdep: Introduce CROSSRELEASE_STACK_TRACE and make it not unwind as default
From: Ingo Molnar
Date: Tue Oct 24 2017 - 06:06:23 EST
Cannot pick up this series yet, but I have enhanced the changelog to:
=============>
Subject: locking/lockdep: Introduce CONFIG_CROSSRELEASE_STACK_TRACE and make it not unwind by default
From: Byungchul Park <byungchul.park@xxxxxxx>
Date: Tue, 24 Oct 2017 18:38:03 +0900
Johan Hovold reported a heavy performance regression caused by
lockdep cross-release:
> Boot time (from "Linux version" to login prompt) had in fact doubled
> since 4.13 where it took 17 seconds (with my current config) compared to
> the 35 seconds I now see with 4.14-rc4.
>
> I quick bisect pointed to lockdep and specifically the following commit:
>
> 28a903f63ec0 ("locking/lockdep: Handle non(or multi)-acquisition
> of a crosslock")
>
> which I've verified is the commit which doubled the boot time (compared
> to 28a903f63ec0^) (added by lockdep crossrelease series [1]).
Currently crossrelease performs unwind on every acquisition, but that is
very expensive.
This patch makes unwind optional and disables it by default and only records
acquire_ip.
Full stack traces are sometimes required for full analysis, in which
case CROSSRELEASE_STACK_TRACE can be enabled.
On my qemu Ubuntu machine (x86_64, 4 cores, 512M), the regression was
fixed. We measure boot times with 'perf stat --null --repeat 10 $QEMU',
where $QEMU launches a kernel with init=/bin/true:
1. No lockdep enabled:
Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):
2.756558155 seconds time elapsed ( +- 0.09% )
2. Lockdep enabled:
Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):
2.968710420 seconds time elapsed ( +- 0.12% )
3. Lockdep enabled + crossrelease enabled:
Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):
3.153839636 seconds time elapsed ( +- 0.31% )
4. Lockdep enabled + crossrelease enabled + this patch applied:
Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):
2.963669551 seconds time elapsed ( +- 0.11% )
I.e. lockdep-crossrelease performance is now indistinguishable
from vanilla lockdep.