Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones
From: Doug Anderson
Date: Thu Sep 03 2020 - 11:12:44 EST
Hi,
On Thu, Sep 3, 2020 at 5:17 AM Matthias Kaehlcke <mka@xxxxxxxxxxxx> wrote:
>
> Hi Rajendra,
>
> On Thu, Sep 03, 2020 at 11:00:52AM +0530, Rajendra Nayak wrote:
> >
> > On 9/3/2020 10:14 AM, Rajendra Nayak wrote:
> > >
> > > On 9/2/2020 9:02 PM, Doug Anderson wrote:
> > > > Hi,
> > > >
> > > > On Tue, Sep 1, 2020 at 10:36 PM Rajendra Nayak <rnayak@xxxxxxxxxxxxxx> wrote:
> > > > >
> > > > >
> > > > > > * In terms of the numbers here, I believe that you're claiming that we
> > > > > > can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power. My memory
> > > > > > of how much power we could dissipate in previous laptops I worked on
> > > > > > is a little fuzzy, but that doesn't seem insane for a passively-cooled
> > > > > > laptop. However, I think someone could conceivably put this chip in a
> > > > > > smaller form factor. In such a case, it seems like we'd want these
> > > > > > things to sum up to ~2000 (if it would ever make sense for someone to
> > > > > > put this chip in a phone) or ~4000 (if it would ever make sense for
> > > > > > someone to put this chip in a small tablet). It seems possible that,
> > > > > > to achieve this, we might have to tweak the
> > > > > > "dynamic-power-coefficient".
> > > > >
> > > > > DPC values are calculated (at a SoC) by actually measuring max power at various
> > > > > frequency/voltage combinations by running things like dhrystone.
> > > > > How would the max power a SoC can generate depend on form factors?
> > > > > How much it can dissipate sure is, but then I am not super familiar how
> > > > > thermal frameworks end up using DPC for calculating power dissipated,
> > > > > I am guessing they don't.
> > > > >
> > > > > > I don't know how much thought was put
> > > > > > into those numbers, but the fact that the little cores have a super
> > > > > > round 100 for their dynamic-power-coefficient makes me feel like they
> > > > > > might have been more schwags than anything. Rajendra maybe knows?
> > > > >
> > > > > FWIK, the values are always scaled and normalized to 100 for silver and
> > > > > then used to derive the relative DPC number for gold. If you see the DPC
> > > > > for silver cores even on sdm845 is a 100.
> > > > > Again these are not estimations but based on actual power measurements.
> > > >
> > > > The scaling to 100 doesn't seem to match how the thermal framework is
> > > > using them. Take a look at of_cpufreq_cooling_register(). It takes
> > > > the "dynamic-power-coefficient" and passes it as "capacitance" into
> > > > __cpufreq_cooling_register(). That's eventually used to compute
> > > > power, which is documented in the code to be in mW.
> > > >
> > > > power = (u64)capacitance * freq_mhz * voltage_mv * voltage_mv;
> > > > do_div(power, 1000000000);
> > > >
> > > > /* power is stored in mW */
> > > > freq_table[i].power = power;
> > > >
> > > > That's used together with "sustainable-power", which is the attribute
> > > > that Matthias is trying to set. That value is documented to be in mW
> > > > as well.
> > > >
> > > > ...so if the silver cores are always scaled to 100 regardless of how
> > > > much power they actually draw then it'll be impossible to actually
> > > > think about "sustainable-power" as a mW value. Presumably we either
> > > > need to accept that fact (and ideally document it) or we need to
> > > > change the values for silver / gold cores (we could still keep the
> > > > relative values the same and just scale them).
> > >
> > > That sounds reasonable (still keep the relative values and scale them)
> > > I'll get back on what those scaled numbers would look like, and try to
> > > get some sense of why this scaling to 100 was done (like you said
> > > I don't see any documentation on this), but I see atleast a few other non-qcom
> > > SoCs doing this too in mainline (like rockchip/rk3399)
I don't think I was too closely involved in these numbers on rk3399,
but as far as I can tell the 100 number came from:
https://crrev.com/c/364003
...interestingly enough the number _wasn't_ scaled to 100 (but was a
number close to 100) and then was changed to scale to 100. That makes
it seem like 100, though awfully round, was at least based loosely on
fact for rk3399.
In any case, the devicetree bindings make it pretty clear that this
value should be based in reality and not some bogus number.
> > On second thoughts, why wouldn't a relative 'sustainable-power' value work?
> > On every device, one would need to do the exercise that Matthias did to come
> > up with the OPP at which we can sustain max CPU/GPU loads anyway.
>
> You assume that a thermal zone only has cooling devices of a the same type (or
> with the same fake unit for power consumption). This falls apart when multiple
> types are used, which is common.
>
> Also sustainable power is only a derived value, the lying already starts in
> the energy model, which is used by EAS, so a fake unit could cause further
> problems.
>
> > I mean even if we do change the DPC values to match actual power, Matthias would
> > still observe that we can sustain at the very same OPP and not any different.
> > Its just that the mW values that are passed to kernel are relative and not
> > absolute. My worry is that perhaps no SoC vendor wants to put these absolute numbers
> > out.
>
> This is pretty much 'security' by obscurity. It would be relatively easy to
> measure actual power consumption at different CPU speeds and derive the DPC
> values from that.
Right, I was going to say that. Specifically:
* Anyone that actually gets one of these chips can just measure it
pretty trivially. Run the core at a certain speed and measure with
the smart battery. Run at a different speed and measure again.
* Presumably the power consumption of different types of cores in
Qualcomm SoCs of the same generation is roughly equivalent. So I
could go and grab a Pixel 4a and put AOSP on it and measure the power
consumption and presumably get pretty close numbers for big and little
power coefficients. I don't know for sure if Pixel 4a's SoC is
officially the same generation but I'd bet it's close.
* Presumably someone would be able to get a pretty good guess by
figuring out the form factor and working backwards. It sounds as if
thermal dissipation (in terms of Watts) for various form factor
devices is somewhat standard. Maybe this is more so for phones /
tablets than laptops which might have bigger heat pipes or active
cooling, but still. Someone could do the math pretty easily.
I guess if you're really worried about protecting this then you can
delay posting it for brand new chipsets using a new type of technology
until product is almost ready to ship, but for sc7180 it doesn't feel
like this is something worth fighting about.
-Doug