Re: [PATCH v3 2/2] sched: update the rq->avg_idle when a task is moved to an idle CPU
From: K Prateek Nayak
Date: Thu Nov 27 2025 - 05:12:39 EST
Hello Huang Shijie,
On 11/27/2025 2:44 PM, Huang Shijie wrote:
> void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
> {
> + int delayed = p->se.sched_delayed;
> +
> if (!(flags & ENQUEUE_NOCLOCK))
> update_rq_clock(rq);
>
> @@ -2100,6 +2117,13 @@ void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
>
> if (sched_core_enabled(rq))
> sched_core_enqueue(rq, p);
> +
> + if (delayed) {
> + if (entity_eligible(cfs_rq_of(&p->se), &p->se))
> + update_rq_avg_idle(rq);
Question: Why do we want to treat the delayed case like this?
If entity is not eligible, we want to consider that it hasn't
even gone through a wakeup? Wouldn't this lead to the next
wakeup seeing rq->idle_stamp to be non-zero and inaccurately
account more idle time?
Also if we've done newidle balance and the rq->idle_stamp is
set, we cannot have delayed tasks since pick_next_task() would
have dequeued all delayed tasks before reaching newidle
balance.
Just doing a update_rq_avg_idle() unconditionally should be
fine.
> + } else {
> + update_rq_avg_idle(rq);
> + }
> }
>
> /*
--
Thanks and Regards,
Prateek