Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime
From: Wei Wang
Date: Mon Oct 03 2022 - 18:57:34 EST
We have some data on an earlier build of Pixel 6a, which also runs a
slightly modified "sched" governor. The tuning definitely has both
performance and power impact on UX. With some additional user space
hints such as ADPF (Android Dynamic Performance Framework) and/or the
old-fashioned INTERACTION power hint, different trade-offs can be
archived with this sort of tuning.
+---------------------------------------------------------+----------+----------+
| Metrics | 32ms |
8ms |
+---------------------------------------------------------+----------+----------+
| Sum of gfxinfo_com.android.test.uibench_deadline_missed | 185.00 |
112.00 |
| Sum of SFSTATS_GLOBAL_MISSEDFRAMES | 62.00 |
49.00 |
| CPU Power | 6,204.00 |
7,040.00 |
| Sum of Gfxinfo.frame.95th | 582.00 |
506.00 |
| Avg of Gfxinfo.frame.95th | 18.19 |
15.81 |
+---------------------------------------------------------+----------+----------+
On Thu, Sep 29, 2022 at 11:59 PM Kajetan Puchalski
<kajetan.puchalski@xxxxxxx> wrote:
>
> On Thu, Sep 29, 2022 at 01:21:45PM +0200, Peter Zijlstra wrote:
> > On Thu, Sep 29, 2022 at 12:10:17PM +0100, Kajetan Puchalski wrote:
> >
> > > Overall, the problem being solved here is that based on our testing the
> > > PELT half life can occasionally be too slow to keep up in scenarios
> > > where many frames need to be rendered quickly, especially on high-refresh
> > > rate phones and similar devices.
> >
> > But it is a problem of DVFS not ramping up quick enough; or of the
> > load-balancer not reacting to the increase in load, or what aspect
> > controlled by PELT is responsible for the improvement seen?
>
> Based on all the tests we've seen, jankbench or otherwise, the
> improvement can mainly be attributed to the faster ramp up of frequency
> caused by the shorter PELT window while using schedutil. Alongside that
> the signals rising faster also mean that the task would get migrated
> faster to bigger CPUs on big.LITTLE systems which improves things too
> but it's mostly the frequency aspect of it.
>
> To establish that this benchmark is sensitive to frequency I ran some
> tests using the 'performance' cpufreq governor.
>
> Max frame duration (ms)
>
> +------------------+-------------+----------+
> | kernel | iteration | value |
> |------------------+-------------+----------|
> | pelt_1 | 10 | 157.426 |
> | pelt_4 | 10 | 85.2713 |
> | performance | 10 | 40.9308 |
> +------------------+-------------+----------+
>
> Mean frame duration (ms)
>
> +---------------+------------------+---------+-------------+
> | variable | kernel | value | perc_diff |
> |---------------+------------------+---------+-------------|
> | mean_duration | pelt_1 | 14.6 | 0.0% |
> | mean_duration | pelt_4 | 14.5 | -0.58% |
> | mean_duration | performance | 4.4 | -69.75% |
> +---------------+------------------+---------+-------------+
>
> Jank percentage
>
> +------------+------------------+---------+-------------+
> | variable | kernel | value | perc_diff |
> |------------+------------------+---------+-------------|
> | jank_perc | pelt_1 | 2.1 | 0.0% |
> | jank_perc | pelt_4 | 2 | -3.46% |
> | jank_perc | performance | 0.1 | -97.25% |
> +------------+------------------+---------+-------------+
>
> As you can see, bumping up frequency can hugely improve the results
> here. This is what's happening when we decrease the PELT window, just on
> a much smaller and not as drastic scale. It also explains specifically
> where the increased power usage is coming from.