Re: [PATCH v6 2/2] sched: update the rq->avg_idle when a task is moved to an idle CPU

From: Dietmar Eggemann
Date: Wed Dec 17 2025 - 11:23:36 EST


On 15.12.25 10:35, Shijie Huang wrote:
>
> On 12/12/2025 22:22, Dietmar Eggemann wrote:
>>>> So you could move (1) back to (2) avoiding the 'if rq->idle_stamp' for
>>>> the sched_change pattern for instance?
>>> Could you please tell me what is "avoiding the 'if rq->idle_stamp' for
>>> the sched_change pattern" ?
>>>
>>> Sorry, I do not understand your meaning.
>> sched_change uses dequeue_task()/enqueue_task() for a queued task to
>> change prio, policy, sched params, taskgroups, etc.
>
> For sched_change, the dequeue_task()/enqueue_task() only work when
>
> the queued task has TASK_ON_RQ_QUEUED flags. The TASK_ON_RQ_QUEUED
>
> is set in activate_task().

I guess this was a misunderstanding. It works because of 'if
(rq->idle_stamp)' and setting 'rq->idle_stamp = 0' within the condition
but this condition isn't worth checking in certain places where we
actually call enqueue_task(). >
>   1.) For this active task, if the sched_change makes it dequeue_task()/
> enqueue_task() on
>
>       current CPU, it's okay. Since current CPU is not in the newidle,
> the "rq->idle_stamp" is 0 at this case.
>
>       This patch works fine.
>
>
>  2.) For this active task, if the sched_change makes it dequeue_task()/
> enqueue_task() on an another CPU,
>
>       it's okay too.
>
>          2.1) If the another CPU's idle_stamp is 0, the another CPU is
> busy now.
>
>                  The sched_change works fine with this patch.
>
>          2.2) If the another CPU's idle_stamp is not 0, the sched_change
> also works fine with this patch.
>
>                 Since the sched_change is breaking the idle state of the
> another CPU by moving an active
>
>                 task to an idle CPU. It makes sense.

Not sure about this. I thought so far that the sched_change pattern is
doing a task dequeue + enqueue on the same CPU (this CPU or other)? So
you can't come out of idle here. We lock the rq before we call
scoped_guard (sched_change, ...)

I think Vincent is right by saying the update_rq_avg_idle() should be
put into put_prev_task_idle() instead.

Still waiting for the DCPerf Mediawiki test results to see if this
change fixes my 'rq->avg_idle being too big' issue.