Re: [PATCH v2] sched/fair: Fix cpu_util runnable_avg arithmetic
From: Hongyan Xia
Date: Tue Jun 09 2026 - 04:44:50 EST
On 6/9/2026 3:04 PM, Dietmar Eggemann wrote:
> On 05.06.26 15:15, Hongyan Xia wrote:
>> On 6/5/2026 6:35 PM, Dietmar Eggemann wrote:
>>> On 05.06.26 11:43, Hongyan Xia wrote:
>>>> From: Hongyan Xia <hongyan.xia@xxxxxxxxxxxxx>
>>>>
>>>> If we take runnable_avg in max(runnable_avg, util_avg) in cpu_util(), we
>>>> should then add or subtract task runnable_avg, but the arithmetic below
>>>> is still with task util_avg. This mixes runnable_avg with util_avg which
>>>> is incorrect.
>>>>
>>>> Fix by always doing arithmetic with runnable_avg and only take
>>>> max(runnable_avg, util_avg) at the last step.
>>>>
>>>> Fixes: 7d0583cf9ec7 ("sched/fair, cpufreq: Introduce 'runnable boosting'")
>>>> Signed-off-by: Hongyan Xia <hongyan.xia@xxxxxxxxxxxxx>
>>>
>>> Does this fix the issue in EAS energy calculation you mentioned
>>> initially? We now add/subtract task rbl_avg from CPU rbl_avg but can we
>>> now use this value correctly in util_avg based EAS?
>>
>> It does improve things a bit. It used to occasionally pile up tasks on
>> the same CPU, which happens less after this patch. At least it now gets
>> the maths right so this fix should probably be in regardless.
>
> Just to map this into the code:
>
> --- EAS ---:
>
> compute_energy(..., p, dst_cpu)
>
> max_util = eenv_pd_max_util(..., p, dst_cpu)
>
> for_each_cpu(cpu, pd_cpus)
>
> util = cpu_util(cpu, p, dst_cpu, 1)
> ^ boost
>
> energy = em_pd_get_efficient_state(..., max_util)
>
> for (i = min_ps; i <= max_ps; i++)
>
> if (ps->performance >= max_util)
> return i
>
>
> --- schedutil ---:
>
> sugov_get_util(cpu)
>
> util =+ cpu_util_cfs_boost(cpu)
>
> util = cpu_util(cpu, NULL, -1, 1) --> p == NULL, dst_cpu == -1
> ^ boost
>
>
>
> This can help to calculate a more correct max_util value on the EAS-side,
> but won't change schedutil?
Yes. The CPUFreq side has p == NULL so this patch won't affect how
frequency is calculated.
> I agree that it's more correct to add/subract task runnable in case of
> migration but using runnable instead of util vs capacity is still not
> 'correct'?
I guess we have two problems.
1. Adding/subtracting util from runnable_avg in EAS.
2. Using runnable_avg for frequency.
1 is incorrect which I think all of us agree. For 2, not sure if I want
to call it 'wrong', but it's just runnable_avg can grow really big
really quickly, and we want a more sensible value.
>>> How do you want to solve the power consumption regression in you
>>> low-power use cases? Since you mentioned per-CPU tasks in those
>>> contention scenarios (per-CPU worker vs producer *), do you plan to only
>>> use boost in cpu_util() in case the affinity of p (worker) is not
>>> constrained? Not sure whether the consumer (CPU affinity not
>>> constrained) also has rbl_avg > util_avg?
>>
>> This patch only gets the maths right but the energy regression is still
>> big because frequency hasn't changed. The producer-consumer is only the
>> worst offender, not the only one. Trouble is that runnable_avg is just a
>> big number to deal with in general, and you could easily double or
>> triple your frequency if you have many small threads around (which is
>> the case in our mobile cases).
>>
>> We haven't found a good solution to solve it completely. I keep
>> wondering if there could be a better metric than raw runnable_avg. One
>> that is not so big in magnitude and does a much better job to tell the
>> true contention where boosting frequency helps.
>>
>>> *
>>> https://lore.kernel.org/r/4adbab4d-f9e4-4354-aa1e-48f11b1fd208@xxxxxxxxxxxxx
>>>
>>>
>>> [...]