Re: [PATCH v3 1/3] mm/kmemleak: avoid soft lockup when scanning task stacks

From: Catalin Marinas

Date: Mon Jun 15 2026 - 14:24:55 EST


On Mon, Jun 15, 2026 at 10:49:06AM -0700, Breno Leitao wrote:
> kmemleak_scan() walks every thread and scans its kernel stack under a
> single rcu_read_lock() with no reschedule point. On a host with very
> many threads -- amplified by KASAN/lockdep in debug builds -- this loop
> can hog a CPU long enough to trip the soft lockup watchdog:
>
> watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [kmemleak:537]
> scan_block
> kmemleak_scan
> kmemleak_scan_thread
> kthread
>
> A cond_resched() cannot be added directly: the loop runs inside an RCU
> read-side critical section.
>
> Walk the tasks one PID at a time with find_ge_pid(), taking the RCU read
> lock only to look up and pin each task. The stack is then scanned with no
> lock held, so cond_resched() runs between tasks and the scan stops early
> on scan_should_stop(). This follows the next_tgid()/task_seq_get_next()
> iteration pattern and keeps each RCU critical section short.
>
> Fixes: c4b28963fd79 ("mm/kmemleak: rely on rcu for task stack scanning")
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>

I think the Fixes is just a marker to tell how far back to go. Before
the above commit, we used a read_lock(&tasklist_lock) which probably had
similar issues.

Reviewed-by: Catalin Marinas <catalin.marinas@xxxxxxx>

Thanks.

--
Catalin