Re: [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks

From: Lance Yang

Date: Fri Jun 12 2026 - 12:53:03 EST



On Fri, Jun 12, 2026 at 08:16:07AM -0700, Breno Leitao wrote:
>kmemleak_scan() walks every thread and scans its kernel stack under a
>single rcu_read_lock() with no reschedule point. On a host with very
>many threads -- amplified by KASAN/lockdep in debug builds -- this loop
>can hog a CPU long enough to trip the soft lockup watchdog:
>
> watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [kmemleak:537]
> scan_block
> kmemleak_scan
> kmemleak_scan_thread
> kthread
>
>A cond_resched() cannot be added directly: the loop runs inside an RCU
>read-side critical section.
>
>Borrow the rcu_lock_break() pattern from kernel/hung_task.c: when a
>reschedule is needed, pin the two iteration cursors, drop the RCU read
>lock, cond_resched(), then re-acquire it and continue only if both
>cursors are still hashed.
>
>If a cursor was unhashed while the lock was dropped, the thread list
>cannot be walked further, so the round is aborted. Such a round scans
>only part of the task stacks, which would make live objects look
>unreferenced, so reuse the existing "scan interrupted" path to skip
>reporting; the next full scan reports the real leaks.

TBH, a bit dense to me as written ...

>Fixes: c4b28963fd79 ("mm/kmemleak: rely on rcu for task stack scanning")
>Cc: stable@xxxxxxxxxxxxxxx
>Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
>---
>Changes in v2:
>- Do not create the nasty array, but use the same pattern as
> kernel/hung_task.c.
>- Link to v1: https://lore.kernel.org/r/20260611-kmemleak-stack-resched-v1-1-d6248ade5f4a@xxxxxxxxxx
>---
> mm/kmemleak.c | 42 ++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 40 insertions(+), 2 deletions(-)
>
>diff --git a/mm/kmemleak.c b/mm/kmemleak.c
>index 7c7ba17ce7af0..d88274dc0c605 100644
>--- a/mm/kmemleak.c
>+++ b/mm/kmemleak.c
>@@ -1695,6 +1695,32 @@ static void kmemleak_cond_resched(struct kmemleak_object *object)
> put_object(object);
> }
>
>+/*
>+ * Briefly drop the RCU read lock to reschedule during the task stack scan.
>+ * Both cursors are pinned across the gap; return false if either one was
>+ * unhashed meanwhile, so the caller stops this round instead of walking a
>+ * stale list.
>+ */

Personally, looks a bit clunky to me with "gap" and "unhashed" ...

Maybe:

"
Drop RCU long enough to reschedule during task stack scanning. Keep both
cursors alive while RCU is dropped; return false if either cursor can no
longer continue the walk.
"

>+static bool kmemleak_stack_scan_break(struct task_struct *g,
>+ struct task_struct *p)
>+{
>+ bool can_cont;
>+
>+ get_task_struct(g);
>+ get_task_struct(p);
>+
>+ rcu_read_unlock();
>+ cond_resched();
>+ rcu_read_lock();
>+
>+ can_cont = pid_alive(g) && pid_alive(p);
>+
>+ put_task_struct(p);
>+ put_task_struct(g);
>+
>+ return can_cont;
>+}
>+
> /*
> * Print one leak inline. The hex dump is gated on OBJECT_ALLOCATED so it
> * does not touch user memory that was freed concurrently; the rest of the
>@@ -1804,6 +1830,7 @@ static void kmemleak_scan(void)
> int __maybe_unused i;
> struct xarray dedup;
> int new_leaks = 0;
>+ bool aborted = false;
>
> jiffies_last_scan = jiffies;
>
>@@ -1890,11 +1917,21 @@ static void kmemleak_scan(void)
> rcu_read_lock();
> for_each_process_thread(g, p) {
> void *stack = try_get_task_stack(p);
>+
> if (stack) {
> scan_block(stack, stack + THREAD_SIZE, NULL);
> put_task_stack(p);
> }
>+ /*
>+ * This is an expensive loop, we must to call the
>+ * scheduler to avoid lockups
>+ */

need_resched() plus the helper name already says most of it. Maybe just:

"
Break the RCU read-side section before rescheduling.
"

>+ if (need_resched() && !kmemleak_stack_scan_break(g, p)) {
>+ aborted = true;
>+ goto unlock;
>+ }
> }
>+unlock:
> rcu_read_unlock();
> }
>
>@@ -1937,9 +1974,10 @@ static void kmemleak_scan(void)
> scan_gray_list();
>
> /*
>- * If scanning was stopped do not report any new unreferenced objects.
>+ * If scanning was stopped or a stack scan round was aborted, do not
>+ * report any new unreferenced objects.
> */

Maybe just say "stack root scan was incomplete" here? That's the actual
reason we skip reporting.

"
If scanning was stopped or the stack root scan was incomplete, do not
report any new unreferenced objects.
"

>- if (scan_should_stop())
>+ if (scan_should_stop() || aborted)
> return;
>
> /*
>
>---

Apart from that, feel free to add:

Acked-by: Lance Yang <lance.yang@xxxxxxxxx>