Re: [PATCH] locking/lockdep: Disable KASAN instrumentation of lockdep.c

From: Waiman Long
Date: Mon Feb 03 2025 - 09:11:36 EST


On 2/3/25 6:24 AM, Peter Zijlstra wrote:
On Fri, Jan 31, 2025 at 04:47:06PM -0500, Waiman Long wrote:
On 1/31/25 11:50 AM, Waiman Long wrote:
Both KASAN and LOCKDEP are commonly enabled in building a debug kernel.
Each of them can significantly slow down the speed of a debug kernel.
Enabling KASAN instrumentation of the LOCKDEP code will further slow
thing down.

Since LOCKDEP is a high overhead debugging tool, it will never get
enabled in a production kernel. The LOCKDEP code is also pretty mature
and is unlikely to get major changes. There is also a possibility of
recursion similar to KCSAN. As the small advantage of enabling KASAN
instrumentation to catch potential memory access error is probably
not worth the drawback of further slowing down a debug kernel, disable
KASAN instrumentation to enable a debug kernel to gain a little bit of
speed back.

With a debug kernel with both LOCKDEP and KASAN enabled running on a
2-socket 144-thread system, the time to do a "make -j144" kernel build
was 18m40.641s. After applying this patch, the parallel kernel build
time was reduced to 17m35.136s. This is a reduction of about 66s (5.8%).

Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
kernel/locking/Makefile | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 0db4093d17b8..8a588b0227b1 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -6,6 +6,7 @@ KCOV_INSTRUMENT := n
obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o
# Avoid recursion lockdep -> sanitizer -> ... -> lockdep.
+KASAN_SANITIZE_lockdep.o := n
KCSAN_SANITIZE_lockdep.o := n
ifdef CONFIG_FUNCTION_TRACER
The rationale behind this patch is due to the fact that a similar configured
PREEMPT_RT debug kernel is found to be about 3 times slower than the non-RT
debug kernel. For the test same system, the parallel build runtime is
59m56.722s. After applying this patch, it is reduced to 38m3.348s. Its more
than 1/3 reduction is more than I would have expected. So the lockdep code
is much more heavily used in a PREEMPT_RT debug kernel.
Perhaps put that in the changelog instead?

Its not like RT is this secret out of tree project :-)

Also, any quick clues as to what causes the extra lockdep overhead?
Initially I thought perhaps local-lock, but that should also cause
lockdep on !RT builds.

Yes, I am planning to update the patch with more RT debug kernel performance data.

As to why, my guess is that the average nesting depth will be higher because spin_lock_irq* no longer disable IRQ and there is an extra wait lock underneath the rt-mutex. Also the increase in the number of sleep-wake cycles because of the sleeping lock nature of rt-spinlock may be a contributing factor.

Cheers,
Longman

Cheers,
Longman