Re: [PATCH v6 2/2] sched: update the rq->avg_idle when a task is moved to an idle CPU
From: Dietmar Eggemann
Date: Wed Dec 17 2025 - 11:23:36 EST
On 15.12.25 10:35, Shijie Huang wrote:
>
> On 12/12/2025 22:22, Dietmar Eggemann wrote:
>>>> So you could move (1) back to (2) avoiding the 'if rq->idle_stamp' for
>>>> the sched_change pattern for instance?
>>> Could you please tell me what is "avoiding the 'if rq->idle_stamp' for
>>> the sched_change pattern" ?
>>>
>>> Sorry, I do not understand your meaning.
>> sched_change uses dequeue_task()/enqueue_task() for a queued task to
>> change prio, policy, sched params, taskgroups, etc.
>
> For sched_change, the dequeue_task()/enqueue_task() only work when
>
> the queued task has TASK_ON_RQ_QUEUED flags. The TASK_ON_RQ_QUEUED
>
> is set in activate_task().
I guess this was a misunderstanding. It works because of 'if
(rq->idle_stamp)' and setting 'rq->idle_stamp = 0' within the condition
but this condition isn't worth checking in certain places where we
actually call enqueue_task(). >
> 1.) For this active task, if the sched_change makes it dequeue_task()/
> enqueue_task() on
>
> current CPU, it's okay. Since current CPU is not in the newidle,
> the "rq->idle_stamp" is 0 at this case.
>
> This patch works fine.
>
>
> 2.) For this active task, if the sched_change makes it dequeue_task()/
> enqueue_task() on an another CPU,
>
> it's okay too.
>
> 2.1) If the another CPU's idle_stamp is 0, the another CPU is
> busy now.
>
> The sched_change works fine with this patch.
>
> 2.2) If the another CPU's idle_stamp is not 0, the sched_change
> also works fine with this patch.
>
> Since the sched_change is breaking the idle state of the
> another CPU by moving an active
>
> task to an idle CPU. It makes sense.
Not sure about this. I thought so far that the sched_change pattern is
doing a task dequeue + enqueue on the same CPU (this CPU or other)? So
you can't come out of idle here. We lock the rq before we call
scoped_guard (sched_change, ...)
I think Vincent is right by saying the update_rq_avg_idle() should be
put into put_prev_task_idle() instead.
Still waiting for the DCPerf Mediawiki test results to see if this
change fixes my 'rq->avg_idle being too big' issue.