Re: [PATCH v2 0/3] EM / PM: Inefficient OPPs

From: Lukasz Luba
Date: Wed May 26 2021 - 06:24:31 EST




On 5/26/21 10:38 AM, Viresh Kumar wrote:
On 26-05-21, 10:01, Vincent Donnefort wrote:
I originally considered to add the inefficient knowledge into the CPUFreq table.

I wasn't talking about the cpufreq table here in the beginning, but calling
dev_pm_opp_disable(), which will eventually reflect in cpufreq table as well.

But I then gave up the idea for two reasons:

* The EM depends on having schedutil enabled. I don't think that any
other governor would then manage to rely on the inefficient OPPs. (also I
believe Peter had a plan to keep schedutil as the one and only governor)

Right, that EM is only there for schedutil.

I would encourage if this can be done even without the EM dependency, if
possible. It would be a good thing to do generally for any driver that wants to
do that.

* The CPUfreq driver doesn't have to rely on the CPUfreq table, if the
knowledge about inefficient OPPs is into the latter, some drivers might not
be able to rely on the feature (you might say 'their loss' though :))

For those reasons, I thought that adding inefficient support into the
CPUfreq table would complexify a lot the patchset for no functional gain.

What about disabling the OPP in the OPP core itself ? So every user will get the
same picture.

There are SoCs which have OPPs every 100MHz even at high freq. They are
used e.g. when thermal kicks in. We shouldn't disable them in generic
frameworks like OPP. They might be used to provide enough CPU capacity,
when temp is high. Imagine you have a board which does some work:
sends and received some UDP packets. The board has been tested in oven
that it will still be able to handle X messages/sec but using an OPP, which in our heuristic is 'inefficient'. You cannot go above, because it
will overheat the SoC, you might go below and find first 'efficient'
OPP. You might harm this board performance if e.g. the OPP is 20% slower
that this 'inefficient' which was tested by engineers.



Since the whole thing depends on EM and OPPs, I think we can actually do this.

When the cpufreq driver registers with the EM core, lets find all the
Inefficient OPPs and disable them once and for all. Of course, this must be done
on voluntarily basis, a flag from the drivers will do. With this, we won't be
required to update any thing at any of the governors end.

We still need to keep the inefficient OPPs for thermal reason.

How will that benefit us if that OPP is never going to run anyway ? We won't be

This OPP still might be used, the Vincent heuristic is just a 'hint'.
Schedutil will check policy->max and could clamp the 'efficient'
returned freq to first allowed, which might be 'inefficient'

cooling down the CPU then, isn't it ?

The 'inefficient' OPP is called from our 'energy placement' angle. For
other folks from automotive, industrial or IoT who are stress testing
SoCs and boards in various circumstances, they might call our
'inefficient' perf state as 'efficient' - for they need.

In our internal review I pointed that we are optimizing for mobiles with
this and we might actually need a #ifdef, config or a switch for this
heuristic.