Re: [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks

Next message: John Hubbard: "Re: [PATCH v2] gpu: nova-core: clarify FSP ordering in the chipset table"
Previous message: Simon Horman: "Re: [PATCH] net: qrtr: fix 32-bit integer overflow in qrtr_endpoint_post()"
In reply to: Lance Yang: "Re: [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks"
Next in thread: Breno Leitao: "Re: [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Catalin Marinas

Date: Fri Jun 12 2026 - 13:12:45 EST

Hi Breno,

Thanks for addressing this long-standing soft lockup problem.

On Fri, Jun 12, 2026 at 08:16:07AM -0700, Breno Leitao wrote:
> +/*
> + * Briefly drop the RCU read lock to reschedule during the task stack scan.
> + * Both cursors are pinned across the gap; return false if either one was
> + * unhashed meanwhile, so the caller stops this round instead of walking a
> + * stale list.
> + */
> +static bool kmemleak_stack_scan_break(struct task_struct *g,
> + struct task_struct *p)
> +{
> + bool can_cont;
> +
> + get_task_struct(g);
> + get_task_struct(p);
> +
> + rcu_read_unlock();
> + cond_resched();
> + rcu_read_lock();
> +
> + can_cont = pid_alive(g) && pid_alive(p);
> +
> + put_task_struct(p);
> + put_task_struct(g);
> +
> + return can_cont;
> +}

While this matches rcu_lock_break(), it looks to me like we rely too
much on the internals of kernel/exit.c. Ideally this function should be
provided as an API alongside for_each_process_thread() so that we only
have the idiom in one place in case something changes in the future.

Yet anther variant below, untested. Basically, it follows the
next_tgid() or task_seq_get_next() approach (we might as well move this
to a separate function to avoid excessive indentation):

if (kmemleak_stack_scan) {
struct pid *pid;
int nr = 1;

do {
struct task_struct *p = NULL;

rcu_read_lock();
pid = find_ge_pid(nr, &init_pid_ns);
if (pid) {
nr = pid_nr(pid) + 1;
p = pid_task(pid, PIDTYPE_PID);
if (p)
get_task_struct(p);
}
rcu_read_unlock();

if (p) {
void *stack = try_get_task_stack(p);

if (stack) {
scan_block(stack, stack + THREAD_SIZE,
NULL);
put_task_stack(p);
}
put_task_struct(p);
}
cond_resched();
} while (pid);
}

--
Catalin