Re: [PATCH] sched/fair: Fix wakeup_preempt_fair for not waking up task

From: K Prateek Nayak

Date: Thu Apr 30 2026 - 03:50:30 EST


Hello Furkan,

On 4/30/2026 11:46 AM, Furkan Çalışkan wrote:
> On 4/29/26 19:41, Vincent Guittot wrote:
>> The assumption that p is always enqueued and not delayed, is only true for
>> wakeup. If p was moved while sched_delayed, pick_next_entity will dequeue
>> it during the attach and the cfs might become empty.
>>
>> Fixes: ac8e69e69363 ("sched/fair: Fix wakeup_preempt_fair() vs delayed dequeue")
>> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
>> ---
>>
>> I have triggered this while running my latency stress test on a new platform.
>>
>> kernel/sched/fair.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 728965851842..99fb524c4922 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -9147,7 +9147,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
>> * Because p is enqueued, nse being null can only mean that we
>> * dequeued a delayed task.
>> */
>> - if (!nse)
>> + if (!nse && (wake_flags & WF_TTWU))
>> goto pick;
>>
>> if (sched_feat(RUN_TO_PARITY))
>
> When a sched_delayed task is migrated (which can only happen via
> MIGRATE_LOAD per can_migrate_task()), enqueuing it on the dest cpu will
> call wakeup_preempt_fair immediately, and if the dest cpu is not busy,
> pick_next_entity() will likely pick and dequeue it immediately. So a
> wasted enqueue+dequeue pair. Could we skip the enqueue when
> sched_delayed is set, and defer it to the actual wakeup path?

That requires some considerations - if we are migrating a delayed task
to an idle CPU, we can readily block the delayed task if we don't have
other tasks on the migration list.

If the destination is busy, or if we are migrating a bunch of tasks,
we need to know what the final state of the task_timeline will
be to make a decision whether it is okay to block them immediately.

We need to know where the avg_vruntime() and deadline ends up to know
if the task will get picked immediately and we cannot do that without
going through place_entity + __enqueue_entity().

There is also cgroup implication where, the delayed task might not be
picked immediately if it is on a cgroup whose entity is not eligible
and that requires going through the full enqueue + pick.

--
Thanks and Regards,
Prateek