Re: [PATCH] PM / EM: Inefficient OPPs detection

From: Quentin Perret
Date: Fri Apr 23 2021 - 12:14:51 EST


On Thursday 22 Apr 2021 at 16:36:44 (+0100), Vincent Donnefort wrote:
> > > As used in the hot-path, the efficient table is a lookup table, generated
> > > dynamically when the perf domain is created. The complexity of searching
> > > a performance state is hence changed from O(n) to O(1). This also
> > > speeds-up em_cpu_energy() even if no inefficient OPPs have been found.
> >
> > Interesting. Do you have measurements showing the benefits on wake-up
> > duration? I remember doing so by hacking the wake-up path to force tasks
> > into feec()/compute_energy() even when overutilized, and then running
> > hackbench. Maybe something like that would work for you?
> >
> > Just want to make sure we actually need all that complexity -- while
> > it's good to reduce the asymptotic complexity, we're looking at a rather
> > small problem (max 30 OPPs or so I expect?), so other effects may be
> > dominating. Simply skipping inefficient OPPs could be implemented in a
> > much simpler way I think.
> >
> > Thanks,
> > Quentin
>
> On the Pixel4, I used rt-app to generate a task whom duty cycle is getting
> higher for each phase. Then for each rt-app task placement, I measured how long
> find_energy_efficient_cpu() took to run. I repeated the operation several
> times to increase the count. Here's what I've got:
>
> ┌────────┬─────────────┬───────┬────────────────┬───────────────┬───────────────┐
> │ Phase │ duty-cycle │ CPU │ w/o LUT │ w/ LUT │ │
> │ │ │ ├────────┬───────┼───────┬───────┤ Diff │
> │ │ │ │ Mean │ count │ Mean │ count │ │
> ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> │ 0 │ 12.5% │ Little│ 10791 │ 3124 │ 10657 │ 3741 │ -1.2% -134ns │
> ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> │ 1 │ 25% │ Mid │ 2924 │ 3097 │ 2894 │ 3740 │ -1% -30ns │
> ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> │ 2 │ 37.5% │ Mid │ 2207 │ 3104 │ 2162 │ 3740 │ -2% -45ns │
> ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> │ 3 │ 50% │ Mid │ 1897 │ 3119 │ 1864 │ 3717 │ -1.7% -33ns │
> ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> │ │ │ Mid │ 1700 │ 396 │ 1609 │ 1232 │ -5.4% -91ns │
> │ 4 │ 62.5% ├───────┼────────┼───────┼───────┼───────┼───────────────┤
> │ │ │ Big │ 1187 │ 2729 │ 1129 │ 2518 │ -4.9% -58ns │
> ├────────┼─────────────┼───────┼────────┼───────┼───────┼───────┼───────────────┤
> │ 5 │ 75% │ Big │ 984 │ 3124 │ 900 │ 3693 │ -8.5% -84ns │
> └────────┴─────────────┴───────┴────────┴───────┴───────┴───────┴───────────────┘

Thanks for that. Do you have the stddev handy?

> Notice:
>
> * The CPU column describes which CPU ran the find_energy_efficient()
> function.
>
> * I modified my patch so that no inefficient OPPs are reported. This is to
> have a fairer comparison between the original table walk and the lookup
> table.

You mean to avoid the impact of the frequency selection itself? Maybe
pinning the frequencies in the cpufreq policy could do?

>
> * I removed from the table results that didn't have enough count to be
> statistically significant.


Anyways, this looks like a small but consistent gain throughout, so it's a
win for the LUT :)

Thanks,
Quentin