Re: [PATCH] sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups
From: Aaron Lu
Date: Mon Feb 02 2026 - 02:15:52 EST
On Fri, Jan 30, 2026 at 05:03:49PM +0800, Zicheng Qu wrote:
> On 1/30/2026 4:34 PM, Zicheng Qu wrote:
>
> > 4) For kernel <= 5.10: Later, cgroup A is unthrottled. However, the task
> > P has already been migrated out of cgroup A, so unthrottle_cfs_rq()
> > may observe load_weight == 0 and return early without resched_curr()
> > called. For kernel >= 6.6: The unthrottling path normally triggers
> > `resched_curr()` almost cases even when no runnable tasks remain in the
> > unthrottled cgroup, preventing the idle stall described above. However,
> > if cgroup A is removed before it gets unthrottled, the unthrottling path
> > for cgroup A is never executed. In a result, no `resched_curr()` can be
> > called.
I think you are right.
> Hi Aaron,
>
> Apologies for the confusion in my earlier description — the original
> failure model was identified and analyzed on kernels based on LTS 5.10.
>
> Later I realized that on v6.6 and mainline, the issue becomes much harder
> to reproduce due to additional conditions introduced in the condition
> (cfs_rq->on_list) in unthrottle_cfs_rq(), which effectively mask the
> original reproduction path.
>
> As a result, I adjusted the reproducer accordingly. With the updated
> reproducer, the issue can still be triggered on mainline by explicitly
> bypassing the unthrottling reschedule path, as described in the commit
> message.
>
I can reproduce the problem using your reproducer now and also verified
your patch fixed the problem, so feel free to add:
Tested-by: Aaron Lu <ziqianlu@xxxxxxxxxxxxx>