Re: [PATCH v7 1/1] sched: update the rq->avg_idle when a task is moved to an idle CPU

From: Valentin Schneider
Date: Fri Jan 09 2026 - 08:25:23 EST


On 09/01/26 16:06, K Prateek Nayak wrote:
> Hello Valentin,
>
> On 1/9/2026 2:42 PM, Valentin Schneider wrote:
>> On 26/12/25 14:32, Huang Shijie wrote:
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -3609,6 +3609,21 @@ static inline void ttwu_do_wakeup(struct task_struct *p)
>>> trace_sched_wakeup(p);
>>> }
>>>
>>> +void update_rq_avg_idle(struct rq *rq)
>>> +{
>>> + if (rq->idle_stamp) {
>>> + u64 delta = rq_clock(rq) - rq->idle_stamp;
>>> + u64 max = 2*rq->max_idle_balance_cost;
>>> +
>>> + update_avg(&rq->avg_idle, delta);
>>> +
>>> + if (rq->avg_idle > max)
>>> + rq->avg_idle = max;
>>> +
>>> + rq->idle_stamp = 0;
>>> + }
>>> +}
>>> +
>>
>> So if we have this invoked every time we switch to the idle task via
>> put_prev_task_idle(), do we want to move sched_balance_newidle()'s update
>> of rq->idle_stamp() to set_next_task_idle()?
>> > That does change the behaviour as we'd now record any idle duration as
>> opposed to only idle-from-fair duration, but that would mean we'd
>> unconditionally record a rq->idle_stamp and could thus ditch the if{} clause.
>
> So I'm a wee bit skeptical of this - the avg_idle also serves as a
> bailout for newidle_balance(). If a tasks keeps waking up during newidle
> balance, we would like to discourage further attempts of newidle balance
> for a while to avoid CPU being stuck doing newidle balance while having
> runnable tasks waken up on it.
>
> There is no bailout past should_we_balance(), and for large domains, it
> can take a while to get out of balancing.
>
> If we move this to {put_prev,set_next}_task_idle(), we'll completely
> fail to capture that part of newidle balance bailout and I'm afraid
> we'll start doing newidle balance more aggressively.
>

Ah you're right, I'd forgotten about this, there's even a comment above the
idle_stamp update pointing this out... Although AFAICT that means we'd end
up with smaller rq->avg_idle and thus would newidle_balance() less often;
regardless that is a behaviour change.

> I'll get some data over the weekend for the different variants being
> discussed here - if it doesn't reveal anything drastic, we can
> consider moving this accounting to idle task's switch.
>
> --
> Thanks and Regards,
> Prateek