Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime
From: Dietmar Eggemann
Date: Tue Oct 04 2022 - 05:39:21 EST
Hi Wei,
On 04/10/2022 00:57, Wei Wang wrote:
Please don't do top-posting.
> We have some data on an earlier build of Pixel 6a, which also runs a
> slightly modified "sched" governor. The tuning definitely has both
> performance and power impact on UX. With some additional user space
> hints such as ADPF (Android Dynamic Performance Framework) and/or the
> old-fashioned INTERACTION power hint, different trade-offs can be
> archived with this sort of tuning.
>
>
> +---------------------------------------------------------+----------+----------+
> | Metrics | 32ms |
> 8ms |
> +---------------------------------------------------------+----------+----------+
> | Sum of gfxinfo_com.android.test.uibench_deadline_missed | 185.00 |
> 112.00 |
> | Sum of SFSTATS_GLOBAL_MISSEDFRAMES | 62.00 |
> 49.00 |
> | CPU Power | 6,204.00 |
> 7,040.00 |
> | Sum of Gfxinfo.frame.95th | 582.00 |
> 506.00 |
> | Avg of Gfxinfo.frame.95th | 18.19 |
> 15.81 |
> +---------------------------------------------------------+----------+----------+
Which App is package `gfxinfo_com.android.test`? Is this UIBench? Never
ran it.
I'm familiar with `dumpsys gfxinfo <PACKAGE_NAME>`.
# adb shell dumpsys gfxinfo <PACKAGE_NAME>
...
** Graphics info for pid XXXX [<PACKAGE_NAME>] **
...
95th percentile: XXms <-- (a)
...
Number Frame deadline missed: XX <-- (b)
...
I assume that `Gfxinfo.frame.95th` is related to (a) and
`gfxinfo_com.android.test.uibench_deadline_missed` to (b)? Not sure
where `SFSTATS_GLOBAL_MISSEDFRAMES` is coming from?
What's the Sum here? Is it that you ran the test 32 times (582/18.19 = 32)?
[...]
> On Thu, Sep 29, 2022 at 11:59 PM Kajetan Puchalski
> <kajetan.puchalski@xxxxxxx> wrote:
>>
>> On Thu, Sep 29, 2022 at 01:21:45PM +0200, Peter Zijlstra wrote:
>>> On Thu, Sep 29, 2022 at 12:10:17PM +0100, Kajetan Puchalski wrote:
>>>
>>>> Overall, the problem being solved here is that based on our testing the
>>>> PELT half life can occasionally be too slow to keep up in scenarios
>>>> where many frames need to be rendered quickly, especially on high-refresh
>>>> rate phones and similar devices.
>>>
>>> But it is a problem of DVFS not ramping up quick enough; or of the
>>> load-balancer not reacting to the increase in load, or what aspect
>>> controlled by PELT is responsible for the improvement seen?
>>
>> Based on all the tests we've seen, jankbench or otherwise, the
>> improvement can mainly be attributed to the faster ramp up of frequency
>> caused by the shorter PELT window while using schedutil. Alongside that
>> the signals rising faster also mean that the task would get migrated
>> faster to bigger CPUs on big.LITTLE systems which improves things too
>> but it's mostly the frequency aspect of it.
>>
>> To establish that this benchmark is sensitive to frequency I ran some
>> tests using the 'performance' cpufreq governor.
>>
>> Max frame duration (ms)
>>
>> +------------------+-------------+----------+
>> | kernel | iteration | value |
>> |------------------+-------------+----------|
>> | pelt_1 | 10 | 157.426 |
>> | pelt_4 | 10 | 85.2713 |
>> | performance | 10 | 40.9308 |
>> +------------------+-------------+----------+
>>
>> Mean frame duration (ms)
>>
>> +---------------+------------------+---------+-------------+
>> | variable | kernel | value | perc_diff |
>> |---------------+------------------+---------+-------------|
>> | mean_duration | pelt_1 | 14.6 | 0.0% |
>> | mean_duration | pelt_4 | 14.5 | -0.58% |
>> | mean_duration | performance | 4.4 | -69.75% |
>> +---------------+------------------+---------+-------------+
>>
>> Jank percentage
>>
>> +------------+------------------+---------+-------------+
>> | variable | kernel | value | perc_diff |
>> |------------+------------------+---------+-------------|
>> | jank_perc | pelt_1 | 2.1 | 0.0% |
>> | jank_perc | pelt_4 | 2 | -3.46% |
>> | jank_perc | performance | 0.1 | -97.25% |
>> +------------+------------------+---------+-------------+
>>
>> As you can see, bumping up frequency can hugely improve the results
>> here. This is what's happening when we decrease the PELT window, just on
>> a much smaller and not as drastic scale. It also explains specifically
>> where the increased power usage is coming from.