Re: [PATCH v2] sched/fair: Fix cpu_util runnable_avg arithmetic

From: Dietmar Eggemann

Date: Tue Jun 09 2026 - 03:04:55 EST

On 05.06.26 15:15, Hongyan Xia wrote:
> On 6/5/2026 6:35 PM, Dietmar Eggemann wrote:
>> On 05.06.26 11:43, Hongyan Xia wrote:
>>> From: Hongyan Xia <hongyan.xia@xxxxxxxxxxxxx>
>>>
>>> If we take runnable_avg in max(runnable_avg, util_avg) in cpu_util(), we
>>> should then add or subtract task runnable_avg, but the arithmetic below
>>> is still with task util_avg. This mixes runnable_avg with util_avg which
>>> is incorrect.
>>>
>>> Fix by always doing arithmetic with runnable_avg and only take
>>> max(runnable_avg, util_avg) at the last step.
>>>
>>> Fixes: 7d0583cf9ec7 ("sched/fair, cpufreq: Introduce 'runnable boosting'")
>>> Signed-off-by: Hongyan Xia <hongyan.xia@xxxxxxxxxxxxx>
>>
>> Does this fix the issue in EAS energy calculation you mentioned
>> initially? We now add/subtract task rbl_avg from CPU rbl_avg but can we
>> now use this value correctly in util_avg based EAS?
>
> It does improve things a bit. It used to occasionally pile up tasks on
> the same CPU, which happens less after this patch. At least it now gets
> the maths right so this fix should probably be in regardless.

Just to map this into the code:

--- EAS ---:

compute_energy(..., p, dst_cpu)

max_util = eenv_pd_max_util(..., p, dst_cpu)

for_each_cpu(cpu, pd_cpus)

util = cpu_util(cpu, p, dst_cpu, 1)
^ boost

energy = em_pd_get_efficient_state(..., max_util)

for (i = min_ps; i <= max_ps; i++)

if (ps->performance >= max_util)
return i

--- schedutil ---:

sugov_get_util(cpu)

util =+ cpu_util_cfs_boost(cpu)

util = cpu_util(cpu, NULL, -1, 1) --> p == NULL, dst_cpu == -1
^ boost

This can help to calculate a more correct max_util value on the EAS-side,
but won't change schedutil?

I agree that it's more correct to add/subract task runnable in case of
migration but using runnable instead of util vs capacity is still not
'correct'?

>> How do you want to solve the power consumption regression in you
>> low-power use cases? Since you mentioned per-CPU tasks in those
>> contention scenarios (per-CPU worker vs producer *), do you plan to only
>> use boost in cpu_util() in case the affinity of p (worker) is not
>> constrained? Not sure whether the consumer (CPU affinity not
>> constrained) also has rbl_avg > util_avg?
>
> This patch only gets the maths right but the energy regression is still
> big because frequency hasn't changed. The producer-consumer is only the
> worst offender, not the only one. Trouble is that runnable_avg is just a
> big number to deal with in general, and you could easily double or
> triple your frequency if you have many small threads around (which is
> the case in our mobile cases).
>
> We haven't found a good solution to solve it completely. I keep
> wondering if there could be a better metric than raw runnable_avg. One
> that is not so big in magnitude and does a much better job to tell the
> true contention where boosting frequency helps.
>
>> *
>> https://lore.kernel.org/r/4adbab4d-f9e4-4354-aa1e-48f11b1fd208@xxxxxxxxxxxxx
>>
>>
>> [...]