Re: [PATCH] sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups
From: Peter Zijlstra
Date: Mon Feb 02 2026 - 07:50:58 EST
On Fri, Jan 30, 2026 at 08:34:38AM +0000, Zicheng Qu wrote:
> Consider the following sequence on a CPU configured with nohz_full:
>
> 1) A task P runs in cgroup A, and cgroup A becomes throttled due to CFS
> bandwidth control. The gse (cgroup A) where the task P attached is
> dequeued and the CPU switches to idle.
>
> 2) Before cgroup A is unthrottled, task P is migrated from cgroup A to
> another cgroup B (not throttled).
>
> During sched_move_task(), the task P is observed as queued but not
> running, and therefore no resched_curr() is triggered.
>
> 3) Since the CPU is nohz_full, it remains in do_idle() waiting for an
> explicit scheduling event, i.e., resched_curr().
>
> 4) For kernel <= 5.10: Later, cgroup A is unthrottled. However, the task
> P has already been migrated out of cgroup A, so unthrottle_cfs_rq()
> may observe load_weight == 0 and return early without resched_curr()
> called. For kernel >= 6.6: The unthrottling path normally triggers
> `resched_curr()` almost cases even when no runnable tasks remain in the
> unthrottled cgroup, preventing the idle stall described above. However,
> if cgroup A is removed before it gets unthrottled, the unthrottling path
> for cgroup A is never executed. In a result, no `resched_curr()` can be
> called.
>
> 5) At this point, the task P is runnable in cgroup B (not throttled), but
> the CPU remains in do_idle() with no pending reschedule point. The
> system stays in this state until an unrelated event (e.g. a new task
> wakeup or any cases) that can trigger a resched_curr() breaks the
> nohz_full idle state, and then the task P finally gets scheduled.
>
> The root cause is that sched_move_task() may classify the task as only
> queued, not running, and therefore fails to trigger a resched_curr(),
> while the later unthrottling path no longer has visibility of the
> migrated task.
>
> Preserve the existing behavior for running tasks by issuing
> resched_curr(), and explicitly invoke check_preempt_curr() for tasks
> that were queued at the time of migration. This ensures that runnable
> tasks are reconsidered for scheduling even when nohz_full suppresses
> periodic ticks.
>
> Fixes: 29f59db3a74b ("sched: group-scheduler core")
> Signed-off-by: Zicheng Qu <quzicheng@xxxxxxxxxx>
> Reviewed-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
> Reviewed-by: Aaron Lu <ziqianlu@xxxxxxxxxxxxx>
Yes, that makes sense.
Thanks!