Re: [PATCH v6 2/2] sched: update the rq->avg_idle when a task is moved to an idle CPU
From: Dietmar Eggemann
Date: Thu Dec 11 2025 - 11:15:50 EST
On 09.12.25 10:45, Huang Shijie wrote:
> In the newidle balance, the rq->idle_stamp may set to a non-zero value
> if it cannot pull any task.
>
> In the wakeup, it will detect the rq->idle_stamp, and updates
> the rq->avg_idle, then ends the CPU idle status by setting rq->idle_stamp
> to zero.
>
> Besides the wakeup, current code does not end the CPU idle status
> when a task is moved to the idle CPU, such as fork/clone, execve,
> or other cases. In order to get more accurate rq->avg_idle,
> we need to update it at more places(not only the wakeup).
>
> This patch introduces a helper: update_rq_avg_idle().
> And uses it in enqueue_task(), so it will update the rq->avg_idle
> when a task is moved to an idle CPU at:
> -- wakeup
> -- fork/clone
> -- execve
> -- idle balance
> -- other cases
[...]
In v2 you moved update_rq_avg_idle() (1) from activate_task() (2) to
enqueue_task() to possibly handle delayed tasks.
In v3 you figured there can't be any delayed task on a CPU when it sets
rq->idle_stamp in sched_balance_newidle()
So you could move (1) back to (2) avoiding the 'if rq->idle_stamp' for
the sched_change pattern for instance?
I tried to understand whether this patch could help with one issue we
currently have with 'DCPerf Mediawiki' benchmark where on 2 comparable
servers, one has a 10% lower CPU utilization and I traced it down to
significantly different behaviour in sched_balance_newidle() and further
down to:
if (!get_rd_overloaded(this_rq->rd) || this_rq->avg_idle <
sd->max_newidle_lb_cost)
where the one with the lower CPU utilization bails out way less often
(because this_rq->avg_idle is very high, system is overloaded). But I
failed so far. Anyway, we'll use this patch for another test run right now.