Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones
From: Doug Anderson
Date: Tue Sep 01 2020 - 16:19:30 EST
Hi,
On Tue, Sep 1, 2020 at 10:07 AM Matthias Kaehlcke <mka@xxxxxxxxxxxx> wrote:
>
> On Thu, Aug 13, 2020 at 11:30:33AM -0700, Matthias Kaehlcke wrote:
> > The 'sustainable_power' attribute provides an estimate of the sustained
> > power that can be dissipated at the desired control temperature. One
> > could argue that this value is not necessarily the same for all devices
> > with the same SoC, which may have different form factors or thermal
> > designs. However there are reasons to specify a (default) value at SoC
> > level for SC7180: most importantly, if no value is specified at all the
> > power_allocator thermal governor (aka 'IPA') estimates a value, using the
> > minimum power of all cooling devices of the zone, which can result in
> > overly aggressive thermal throttling. For most devices an approximate
> > conservative value should be more useful than the minimum guesstimate
> > of power_allocator. Devices that need a different value can overwrite
> > it in their <device>.dts. Also the thermal zones for SC7180 have a high
> > level of granularity (essentially one for each function block), which
> > makes it more likely that the default value just works for many devices.
> >
> > The values correspond to 1901 MHz for the big cores, and 1804 MHz for
> > the small cores. The values were determined by limiting the CPU
> > frequencies to different max values and launching a bunch of processes
> > that cause high CPU load ('while true; do true; done &' is simple and
> > does a good job). A frequency is deemed sustainable if the CPU
> > temperatures don't rise (consistently) above the second trip point
> > ('control temperature', 95 degC in this case). Once the highest
> > sustainable frequency is found, the sustainable power can be calculated
> > by multiplying the energy consumption per core at this frequency (which
> > can be found in /sys/kernel/debug/energy_model/) with the number of
> > cores that are specified as cooling devices.
> >
> > The sustainable frequencies were determined at room temperature
> > on a device without heat sink or other passive cooling elements.
I'm curious: was this a bare board, or a device in a case? Hrm, I'm
not sure which one would be worse at heat dissipation, but I would
imagine that being inside a plastic case might be worse?
> > Signed-off-by: Matthias Kaehlcke <mka@xxxxxxxxxxxx>
> > ---
> > If maintainers think 'sustainable_power' should be specified at
> > device level (with which I conceptually agree) I'm fine with
> > doing that, just seemed it could be useful to have a reasonable
> > 'default' at SoC level in this case.
>
> Any comments on this?
I'm not massively familiar with this area of the code, but I guess I
shouldn't let that stop me from having an opinion! :-P
* I would agree that it seems highly unlikely that someone would put
one of these chips in a device that could only dissipate the heat from
the lowest OPP, so having some higher estimate definitely makes sense.
* In terms of the numbers here, I believe that you're claiming that we
can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power. My memory
of how much power we could dissipate in previous laptops I worked on
is a little fuzzy, but that doesn't seem insane for a passively-cooled
laptop. However, I think someone could conceivably put this chip in a
smaller form factor. In such a case, it seems like we'd want these
things to sum up to ~2000 (if it would ever make sense for someone to
put this chip in a phone) or ~4000 (if it would ever make sense for
someone to put this chip in a small tablet). It seems possible that,
to achieve this, we might have to tweak the
"dynamic-power-coefficient". I don't know how much thought was put
into those numbers, but the fact that the little cores have a super
round 100 for their dynamic-power-coefficient makes me feel like they
might have been more schwags than anything. Rajendra maybe knows?
* I'm curious about the fact that there are two numbers here: one for
littles and one for bigs. If I had to guess I'd say that since all
the cores are in one package so the contributions kinda need to be
thought of together, right? If we're sitting there thermally
throttled then we'd want to pick the best perf-per-watt for the
overall package. This is why your patch says we can sustain the
little cores at max and the big cores get whatever is left over,
right?
* Should we be leaving some room in here for the GPU? ...or I guess
once we list it as a cooling device we'll have to decrease the amount
the CPUs can use?
So I guess the tl; dr is:
a) We should check "dynamic-power-coefficient" and possibly adjust.
b) I don't think the "conservative" by-default numbers should add up
to 7 Watts. I could be convinced that this chip is not intended for
phones and thus we could have it add up to 4 Watts, but 7 Watts seems
too much.
-Doug