[PATCH v2 2/2] locking/lockdep: Disable KASAN instrumentation of lockdep.c
From: Waiman Long
Date: Tue Feb 04 2025 - 20:31:05 EST
Both KASAN and LOCKDEP are commonly enabled in building a debug kernel.
Each of them can significantly slow down the speed of a debug kernel.
Enabling KASAN instrumentation of the LOCKDEP code will further slow
thing down.
Since LOCKDEP is a high overhead debugging tool, it will never get
enabled in a production kernel. The LOCKDEP code is also pretty mature
and is unlikely to get major changes. There is also a possibility of
recursion similar to KCSAN. As the small advantage of enabling KASAN
instrumentation to catch potential memory access error is probably not
worth the drawback of further slowing down a debug kernel, disable KASAN
instrumentation to enable a debug kernel to gain some performance back.
With debug kernels with both LOCKDEP and KASAN enabled running on a
2-socket 128-thread x86-64 system and a 80-core arm64 system, the times
(real and sys with the time command) to do a parallel kernel build are
shown below.
Kernel type Real Time Sys Time
----------- --------- --------
x86-64:
Non-debug kernel 9m38.528s 304m17.007s
Debug kernel before patch 16m38.765s 1086m34.930s
Debug kernel after patch 16m4.758s 1025m26.335s
Before/after % change -3.4% -5.6%
Non-debug RT kernel 11m32.804s 121m52.835s
Debug RT kernel before patch 59m29.618s 1772m30.699s
Debug RT kernel after patch 37m47.089s 937m56.856s
Before/after % change -36.5% -47.1%
arm64:
Debug RT kernel before patch 46m9.385s 676m13.605s
Debug RT kernel after patcha 33m41.428s 436m3.430s
Before/after % change -27.0% -35.5%
It looks like the KASAN instrumentation overhead is less on arm64. While
the performance benefit for non-RT debug kernel is modest, the
performance gain for RT debug kernel is significant.
Looking at the RT kernel locking event data for the x86-64 system, we have
Event type Non-debug Debug before patch Debug after patch
---------- --------- ------------------ -----------------
rtlock_slowlock 66,593,828 2,868,760,165 2,832,990,386
rtlock_slow_acq1 43,705,130 2,833,575,907 2,800,928,283
rtlock_slow_acq2 22,888,698 35,177,418 32,055,592
rtlock_slow_sleep 22,568,560 29,206,559 27,833,274
rtmutex_slowlock 468,207 560,080 549,080
rtmutex_slow_acq1 11,840 67,208 39,353
rtmutex_slow_block 456,367 492,872 509,727
rtmutex_slow_sleep 258,071 208,019 220,480
The profile of the debug kernel before and after patch are similar.
Compared with the non-debug kernel, the number of rtlock_slowlock() has
increased significantly by more than 40x. That means the corresponding
wait_lock has to be acquired that many more times with the associated
lockdep overhead. The average lock nesting depth will also be higher.
The non-RT debug kernel doesn't have this extra overhead.
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
kernel/locking/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 0db4093d17b8..a114949eeed5 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -5,7 +5,8 @@ KCOV_INSTRUMENT := n
obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o
-# Avoid recursion lockdep -> sanitizer -> ... -> lockdep.
+# Avoid recursion lockdep -> sanitizer -> ... -> lockdep & improve performance.
+KASAN_SANITIZE_lockdep.o := n
KCSAN_SANITIZE_lockdep.o := n
ifdef CONFIG_FUNCTION_TRACER
--
2.48.1