Re: [PATCH v2] mm/kmemleak: avoid soft lockup when scanning task stacks

From: Oleg Nesterov

Date: Sat Jun 13 2026 - 06:45:43 EST


To avoid the confusion, I see nothing wrong in this patch, but see
the question at the end.

On 06/12, Breno Leitao wrote:
>
> +/*
> + * Briefly drop the RCU read lock to reschedule during the task stack scan.
> + * Both cursors are pinned across the gap; return false if either one was
> + * unhashed meanwhile, so the caller stops this round instead of walking a
> + * stale list.
> + */
> +static bool kmemleak_stack_scan_break(struct task_struct *g,
> + struct task_struct *p)
> +{
> + bool can_cont;
> +
> + get_task_struct(g);
> + get_task_struct(p);
> +
> + rcu_read_unlock();
> + cond_resched();
> + rcu_read_lock();
> +
> + can_cont = pid_alive(g) && pid_alive(p);
> +
> + put_task_struct(p);
> + put_task_struct(g);
> +
> + return can_cont;
> +}

Perhaps we can rename and export rcu_lock_break() to avoid the duplication...

And, this is slightly off-topic, please ignore, but this reminds me about
[PATCH 1/2] introduce for_each_process_thread_break() and for_each_process_thread_continue()
https://lore.kernel.org/all/20180912163335.GA18748@xxxxxxxxxx/

> @@ -1890,11 +1917,21 @@ static void kmemleak_scan(void)
> rcu_read_lock();
> for_each_process_thread(g, p) {
> void *stack = try_get_task_stack(p);
> +
> if (stack) {
> scan_block(stack, stack + THREAD_SIZE, NULL);
> put_task_stack(p);
> }
> + /*
> + * This is an expensive loop, we must to call the
> + * scheduler to avoid lockups
> + */
> + if (need_resched() && !kmemleak_stack_scan_break(g, p)) {
> + aborted = true;
> + goto unlock;

Can this need_resched() check actually help if CONFIG_PREEMPTION &&
CONFIG_PREEMPT_RCU ?

In this case (lets ignore PREEMPT_DYNAMIC to simplify) rcu_read_lock()
doesn't disable preemption and cond_resched() is nop, need_resched() is
(almost) never true. Right?

I guess even in this case it makes sense to not abuse rcu_read_lock()
"too much", but perhaps we need something more clever than need_resched() ?

Note that check_hung_uninterruptible_tasks() uses time_after()...

Oleg.