Re: [PATCH v6 2/2] sched: update the rq->avg_idle when a task is moved to an idle CPU
From: Shijie Huang
Date: Tue Dec 16 2025 - 04:51:30 EST
On 16/12/2025 16:47, Vincent Guittot wrote:
On Tue, 16 Dec 2025 at 08:39, Shijie Huang
<shijie@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
On 16/12/2025 15:17, Vincent Guittot wrote:
On Tue, 16 Dec 2025 at 07:22, Shijie Huang
<shijie@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
On 13/12/2025 09:36, Vincent Guittot wrote:
put_prev_task_idle() would be a better place to call
update_rq_avg_idle() because this is when we leave idle.
The update_rq_avg_idle() is not only called by current CPU, but also
called by
other CPUs. For example, the try_to_wake_up(), update_rq_avg_idle() is
called by
the other CPUs. So enqueue_task() is a good place.
But put_prev_task_idle() is called by local CPU whenever it leaves
idle so instead of trying to catch all places that could make the CPU
leave idle it's better to use this single place.
And as you mentioned, put_prev_task_idle is only called by local CPU
whereas enqueue_task can be called by all CPUs creating useless
pressure in the variable.
The rq->idle_stamp is set at sched_balance_newidle(). then we call
update_rq_avg_idle()
in put_prev_task_idle() right now. How can we update the rq->avg_idle?
I'm not sure I understand your point.
rq->avg_idle tracks idle time. The easiest way would be to use
- set_next_task_idle() when we enter idle
- put_prev_task_idle() when we exit idle
Except that sched_balance_newidle() can be long and the time should be
accounted as idle time too. So instead of using set_next_task_idle(),
we use sched_balance_newidle() to set . Which is okay because
sched_balance_newidle() is always called before going to idle.
Thanks for the explanations.
It seems that put_prev_task_idle() is really a better place to call
update_rq_avg_idle(). Let me think it for a while :)
Thanks
Huang Shijie