Re: [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks
From: Catalin Marinas
Date: Fri Jun 12 2026 - 13:12:45 EST
Hi Breno,
Thanks for addressing this long-standing soft lockup problem.
On Fri, Jun 12, 2026 at 08:16:07AM -0700, Breno Leitao wrote:
> +/*
> + * Briefly drop the RCU read lock to reschedule during the task stack scan.
> + * Both cursors are pinned across the gap; return false if either one was
> + * unhashed meanwhile, so the caller stops this round instead of walking a
> + * stale list.
> + */
> +static bool kmemleak_stack_scan_break(struct task_struct *g,
> + struct task_struct *p)
> +{
> + bool can_cont;
> +
> + get_task_struct(g);
> + get_task_struct(p);
> +
> + rcu_read_unlock();
> + cond_resched();
> + rcu_read_lock();
> +
> + can_cont = pid_alive(g) && pid_alive(p);
> +
> + put_task_struct(p);
> + put_task_struct(g);
> +
> + return can_cont;
> +}
While this matches rcu_lock_break(), it looks to me like we rely too
much on the internals of kernel/exit.c. Ideally this function should be
provided as an API alongside for_each_process_thread() so that we only
have the idiom in one place in case something changes in the future.
Yet anther variant below, untested. Basically, it follows the
next_tgid() or task_seq_get_next() approach (we might as well move this
to a separate function to avoid excessive indentation):
if (kmemleak_stack_scan) {
struct pid *pid;
int nr = 1;
do {
struct task_struct *p = NULL;
rcu_read_lock();
pid = find_ge_pid(nr, &init_pid_ns);
if (pid) {
nr = pid_nr(pid) + 1;
p = pid_task(pid, PIDTYPE_PID);
if (p)
get_task_struct(p);
}
rcu_read_unlock();
if (p) {
void *stack = try_get_task_stack(p);
if (stack) {
scan_block(stack, stack + THREAD_SIZE,
NULL);
put_task_stack(p);
}
put_task_struct(p);
}
cond_resched();
} while (pid);
}
--
Catalin