Re: [PATCH] sched/proxy_exec: Limit find_proxy_task() chain depth to prevent CPU hang

From: John Stultz

Date: Mon Apr 20 2026 - 22:28:24 EST

On Mon, Apr 13, 2026 at 10:36 PM <soolaugust@xxxxxxxxx> wrote:
>
> From: zhidao su <suzhidao@xxxxxxxxxx>
>
> find_proxy_task() follows the blocked_on chain with:
>
> for (p = donor; task_is_blocked(p); p = owner)
>
> The existing WARN_ON(owner == p) only detects immediate self-loops
> (a task waiting on a mutex it already owns). It does not detect
> multi-task cycles: if tasks A and B form a cycle where A waits on
> B's mutex and B waits on A's mutex, the chain traversal loops forever
> between A and B, hanging the CPU indefinitely while holding rq->lock.
>
> The scenario is real under PE: mutex-blocked tasks are kept on the
> runqueue (try_to_block_task() with should_block=false), so both A and
> B remain selectable by pick_next_task(). When A is selected as donor,
> find_proxy_task() follows A->mutex_B->owner=B->mutex_A->owner=A->...
> with no termination condition for cycles.
>
> rt-mutex handles this identically with max_lock_depth (default 1024),
> printing a warning and returning -EDEADLK when the chain is too deep.
>
> Add a chain_depth counter with MAX_PROXY_CHAIN_DEPTH=64. When exceeded,
> emit WARN_ONCE and call proxy_resched_idle() to schedule idle briefly,
> consistent with how other unresolvable states are handled in the
> function (e.g., owner migrating, curr_in_chain bailouts). This keeps
> the kernel healthy without spinning; the deadlock resolution is the
> caller's problem.

Nice. I used a very similar change myself when debugging proxy-exec
issues in the early days of getting ww_mutexes working properly. :)

> Tested with a built-in boot-param test (pe_cycle_test) that creates two
> kthreads on CPU 0 each holding one kernel mutex while trying to acquire
> the other, forming an A->B->A deadlock cycle.
>
> With this fix:
>
> [ 111.758150] sched/pe: proxy chain depth exceeded 64, possible deadlock cycle involving pid 120
> [ 111.758150] WARNING: CPU: 0 PID: 119 at kernel/sched/core.c:7339 __schedule+0x1e6e/0x1e80
> ...
> [ 112.694277] pe_cycle_test: still alive after 1s (CPU not hung)
>
> Without this fix, an NMI watchdog (nmi_watchdog=1, watchdog_thresh=15)
> fires a hard LOCKUP on CPU 0 with RIP in do_raw_spin_lock, called from
> __schedule, confirming the CPU spins inside find_proxy_task() holding
> rq->lock with no forward progress:
>
> [ 109.951781] watchdog: CPU0: Watchdog detected hard LOCKUP on cpu 0
> [ 109.951781] RIP: 0010:do_raw_spin_lock+0x3e/0xb0
> [ 109.951781] Call Trace:
> [ 109.951781] __schedule+0x11e7/0x1e10
> [ 109.951781] schedule_preempt_disabled+0x18/0x30
> [ 109.951781] __mutex_lock+0x6f0/0xac0
> [ 109.951781] pe_test_thread_a+0x9c/0xe0

So, I guess I'd be curious what happens without proxy-exec.

My sense if if you have a mutex lock cycle today without proxy
execution you'll just deadlock and get a similar hard LOCKUP warning.
I assume you'd get a LOCKDEP splat as well if that was enabled in
either case, no?

So I'm not sure if I see a whole lot of benefit to rescheduling idle
over and over to keep the system sort of alive when that cpu is not
going to make any progress.

A few more thoughts below...

> Fixes: 7de9d4f94638 ("sched: Start blocked_on chain processing in find_proxy_task()")
> Signed-off-by: zhidao su <suzhidao@xxxxxxxxxx>
> ---
> kernel/sched/core.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3f3425c6b2f2..bafb59432f7f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7310,6 +7310,17 @@ DEFINE_LOCK_GUARD_1(blocked_on_lock, struct blocked_on_lock,
> * Returns the task that is going to be used as execution context (the one
> * that is actually going to be run on cpu_of(rq)).
> */
> +/*
> + * Limit proxy chain traversal depth to avoid infinite loops in pathological
> + * cases (e.g., A waits for B's mutex while B waits for A's mutex). The
> + * existing WARN_ON(owner == p) only catches immediate self-loops; multi-task
> + * cycles like A->B->A are not detected without a depth counter.
> + *
> + * rt-mutex uses a similar guard (max_lock_depth = 1024). We use a smaller
> + * limit since proxy chains are expected to be short in practice.
> + */
> +#define MAX_PROXY_CHAIN_DEPTH 64

So while we'd hope proxy chains are short in most cases, there's no
guarantee they would be different from rt-mutexes.
In fact, with rwsem support, the chains could interleave across lock
types, so I'd probably at least match the rt-mutex value.

> static struct task_struct *
> find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
> __must_hold(__rq_lockp(rq))
> @@ -7318,11 +7329,17 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
> struct task_struct *owner = NULL;
> bool curr_in_chain = false;
> int this_cpu = cpu_of(rq);
> + int chain_depth = 0;
> struct task_struct *p;
> int owner_cpu;
>
> /* Follow blocked_on chain. */
> for (p = donor; task_is_blocked(p); p = owner) {
> + if (++chain_depth > MAX_PROXY_CHAIN_DEPTH) {
> + WARN_ONCE(1, "sched/pe: proxy chain depth exceeded %d, possible deadlock cycle involving pid %d\n",
> + MAX_PROXY_CHAIN_DEPTH, p->pid);
> + return proxy_resched_idle(rq);

So at this point the cpu is going to be stuck, as as soon as it
switches to idle, it will call back into __schedule(), select the same
donor task and and traverse the same chain, and then reschedule idle
and start again.

So it seems to me like BUG() would be more appropriate here as the cpu
is effectively deadlocked.

I guess one could deactivate the selected blocked donor task, which
would let the cpu continue to run other tasks, but the entire lock
chain would eventually get deactivated and would never be woken up, so
it would likely trip hung task warnings. So I of would lean towards
BUG() since lock cycles are a big no no (for non-ww_mutexes) and I'd
fret if you don't stop the system folks will just ignore warnings and
not really understand why things aren't working properly.

But that's just my instinct.

Anyway, thanks for the submission here! I'm excited to see more folks
working and testing with proxy-exec!

thanks
-john