Re: [PATCH] sched/fair: Revert boost in cpu_util()
From: Hongyan Xia
Date: Tue May 26 2026 - 03:36:02 EST
On 5/18/2026 6:04 PM, Christian Loehle wrote:
> On 5/18/26 03:40, hongyan.xia(夏弘彦) wrote:
>> From: Hongyan Xia <hongyan.xia@xxxxxxxxxxxxx>
>>
>> We have seen a massive power consumption regression (20% SoC power
>> increase in many apps) after updating our kernel. After bisection we
>> pinpointed the regression to the cpu_util(boost) feature. After
>> reverting the boost feature the massive energy regression is gone.
>> Detailed trace analysis down below. The regression is found across quite
>> many apps but Youtube is one of the worst offenders, shown in the
>> 1080p60fps video benchmark:
>>
>> Setup FPS SoC Power (mW) diff
>> w/ boost 59.94 913.6
>> w/o boost 59.93 720.4 -21.15%
>>
>> Signed-off-by: Hongyan Xia <hongyan.xia@xxxxxxxxxxxxx>
>>
>> ---
>> Analysis:
>>
>> We found several problems that result in the power spike:
>>
>> 1. Arithmetic should not happen between util_avg and runnable_avg:
>>
>> After util = max(util, runnable) which potentially picks runnable value
>> in cpu_util(), we then add or subtract task util values from it. This
>> produces a value that is half-runnable-half-util which is ill-defined.
>> This alone should be a warning sign. This breaks EAS calculations in
>> many cases, leading to sub-optimal task placements.
>>
>> 2. Using the absolute value of runnable_avg to drive frequency is
>> too high to be reasonable:
>>
>> We use runnable in a _relative_ way to util to know whether there is
>> contention in several places. However, the _absolute_ value should not
>> be used like util. Runnable_avg tends to be significantly higher,
>> making it much easier to saturate frequency.
>>
>> For example, if three tasks each with a util of 100 contend on the same
>> rq, the rq util is 300 but runnable_avg shoots up to 900. 900 drives the
>> CPU at the max frequency, and it's highly questionable whether this
>> boost is the right decision.
>>
>> 3. Runnable_avg may not even reflect true contention:
>>
>> When tasks are dependent, the bottleneck is often the data flow between
>> tasks, not the contention seen by runnable_avg. Boosting frequency with
>> runnable in such scenarios wastes power without performance benefits.
>>
>> We found 1 has minor power regression but 2 and 3 regresses power
>> significantly. We have seen multiple applications with the
>> producer-consumer model with many worker threads suffer. When there is
>> IPC between producer and consumer, boosting frequency blindly does not
>> help performance at all if consumer is limited by how much data is flown
>> through. Youtube suffer from 1, 2 and 3 at the same time, leading to a
>> total SoC power regression of 20% shown in the results above.
>
> We did discuss removing runnable boost internally as well, but I’d love to see
> more data too.
> The original issue it was trying to solve was avoiding jank frames during load
> spikes, which YouTube does not really exercise. Some gaming workload data would
> therefore be a useful addition here.
Some gaming numbers on some popular mobile games played by our
customers. Still on Dimensity 8400 like all the numbers before. sdev
stands for standard deviation.
Mobile Legends
FPS sdev total power diff
w/o boost 120.07 0.56 2996.09mW
w/ boost 120.16 0.47 3294.10mW +9.95%
Genshin Impact (medium quality)
FPS sdev total power diff
w/o boost 60.03 0.35 5695.46mW
w/ boost 60.05 0.34 6215.84mW +9.14%
Genshin Impact (high quality)
FPS sdev total power diff
w/o boost 60.06 0.27 6356.43mW
w/ boost 60.08 0.27 6672.77mW +4.98%
Looks like in these games the boost feature isn't much useful. The
average FPS is roughly the same and even the standard deviation doesn't
change much. These numbers are total phone power, so if you count only
the SoC the power increase will be much higher than 10%. Problem is that
these games are capable to run above the FPS limit, so boosting won't
help at all.
So far I fail to find concrete real-world workloads that show noticeable
benefit from boosting. Do you have anything I can run and see?
> [...]