Re: [PATCH v3 0/3] cpuidle: teo: Avoid stopping scheduler tick too often

From: Rafael J. Wysocki
Date: Wed Aug 02 2023 - 08:20:44 EST


On Tue, Aug 1, 2023 at 11:53 PM Kajetan Puchalski
<kajetan.puchalski@xxxxxxx> wrote:
>
> Hi Rafael,
>
> > Hi Folks,
> >
> > Patch [1/3] in this series is a v3 of this patch posted last week:
> >
> > https://lore.kernel.org/linux-pm/4506480.LvFx2qVVIh@kreacher/
> >
> > Patch [2/3] (this is the second version of it) addresses some bail out paths
> > in teo_select() in which the scheduler tick may be stopped unnecessarily too.
> >
> > Patch [3/3] replaces a structure field with a local variable (while at it)
> > and it is the same as its previous version.
> >
> > According to this message:
> >
> > https://lore.kernel.org/linux-pm/CAJZ5v0jJxHj65r2HXBTd3wfbZtsg=_StzwO1kA5STDnaPe_dWA@xxxxxxxxxxxxxx/
> >
> > this series significantly reduces the number of cases in which the governor
> > requests stopping the tick when the selected idle state is shallow, which is
> > incorrect.
> >
> > Thanks!
> >
> >
>
> I did some initial testing with this on Android (Pixel 6, Android 13).
>
> 1. Geekbench 6
>
> +---------------------------+---------------+-----------------+
> | metric | teo | teo_tick |
> +---------------------------+---------------+-----------------+
> | multicore_score | 3320.9 (0.0%) | 3303.3 (-0.53%) |
> | score | 1415.7 (0.0%) | 1417.7 (0.14%) |
> | CPU_total_power | 2421.3 (0.0%) | 2429.3 (0.33%) |
> | latency (AsyncTask #1) | 49.41μ (0.0%) | 51.07μ (3.36%) |
> | latency (labs.geekbench6) | 65.63μ (0.0%) | 77.47μ (18.03%) |
> | latency (surfaceflinger) | 39.46μ (0.0%) | 36.94μ (-6.39%) |
> +---------------------------+---------------+-----------------+
>
> So the big picture for this workload looks roughly the same, the
> differences are too small for me to be confident in saying that the
> score/power difference is the result of the patches and not something
> random in the system.
> Same with the latency, the difference for labs.gb6 stands out but that's
> a pretty irrelevant task that sets up the benchmark, not the benchmark
> itself so not the biggest deal I think.
>
> +---------------+---------+------------+--------+
> | kernel | cluster | idle_state | time |
> +---------------+---------+------------+--------+
> | teo | little | 0.0 | 146.75 |
> | teo | little | 1.0 | 53.75 |
> | teo_tick | little | 0.0 | 63.5 |
> | teo_tick | little | 1.0 | 146.78 |
> +---------------+---------+------------+--------+
>
> +---------------+-------------+------------+
> | kernel | type | count_perc |
> +---------------+-------------+------------+
> | teo | too deep | 2.034 |
> | teo | too shallow | 15.791 |
> | teo_tick | too deep | 2.16 |
> | teo_tick | too shallow | 20.881 |
> +---------------+-------------+------------+
>
> The difference shows up in the idle numbers themselves, looks like we
> get a big shift towards deeper idle on our efficiency cores (little
> cluster) and more missed wakeups overall, both too deep & too shallow.
>
> Notably, the percentage of too shallow sleeps on the performance cores has
> more or less doubled (2% + 0.8% -> 4.3% + 1.8%). This doesn't
> necessarily have to be an issue but I'll do more testing just in case.
>
> 2. JetNews (Light UI workload)
>
> +------------------+---------------+----------------+
> | metric | teo | teo_tick |
> +------------------+---------------+----------------+
> | fps | 86.2 (0.0%) | 86.4 (0.16%) |
> | janks_pc | 0.8 (0.0%) | 0.8 (-0.00%) |
> | CPU_total_power | 185.2 (0.0%) | 178.2 (-3.76%) |
> +------------------+---------------+----------------+
>
> For the UI side, the frame data comes out the same on both variants but
> alongside better power usage which is nice to have.
>
> +---------------+---------+------------+-------+
> | kernel | cluster | idle_state | time |
> +---------------+---------+------------+-------+
> | teo | little | 0.0 | 25.06 |
> | teo | little | 1.0 | 12.21 |
> | teo | mid | 0.0 | 38.32 |
> | teo | mid | 1.0 | 17.82 |
> | teo | big | 0.0 | 30.45 |
> | teo | big | 1.0 | 38.5 |
> | teo_tick | little | 0.0 | 23.18 |
> | teo_tick | little | 1.0 | 14.21 |
> | teo_tick | mid | 0.0 | 36.31 |
> | teo_tick | mid | 1.0 | 19.88 |
> | teo_tick | big | 0.0 | 27.13 |
> | teo_tick | big | 1.0 | 42.09 |
> +---------------+---------+------------+-------+
>
> +---------------+-------------+------------+
> | kernel | type | count_perc |
> +---------------+-------------+------------+
> | teo | too deep | 0.992 |
> | teo | too shallow | 17.085 |
> | teo_tick | too deep | 0.945 |
> | teo_tick | too shallow | 15.236 |
> +---------------+-------------+------------+
>
> For the idle stuff here all 3 clusters shift a bit towards deeper idle
> but the overall miss rate is lower across the board which is perfectly
> fine.
>
> TLDR:
> Mostly no change for a busy workload, no change + better power for a UI
> one. The patches make sense to me & the results look all right so no big
> problems at this stage. I'll do more testing (including the RFC you sent
> out a moment ago) over the next few days and send those out as well.
>
> Short of bumping into any other problems along the way, feel free to
> grab this if you'd like:
> Reviewed-and-tested-by: Kajetan Puchalski <kajetan.puchalski@xxxxxxx>

Thank you!