Re: [RFC PATCH 0/4] thermal: Introduce support for monitoring falling temperature

From: Thara Gopinath
Date: Wed Jul 15 2020 - 19:10:48 EST




On 7/15/20 4:27 AM, Zhang Rui wrote:
Hi, Thara,

On Tue, 2020-07-14 at 17:39 -0400, Thara Gopinath wrote:


For example, to support this, we can
either
introduce both "cold" trip points and "warming devices", and
introduce
new logic in thermal framework and governors to handle them,
Or
introduce "cold" trip point and "warming" device, but only
semantically, and treat them just like normal trip points and
cooling
devices. And strictly define cooling state 0 as the state that
generates most heat, and define max cooling state as the state that
generates least heat. Then, say, we have a trip point at -10C, the
"warming" device is set to cooling state 0 when the temperature is
lower than -10C, and in most cases, this thermal zone is always in
a
"overheating" state (temperature higher than -10C), and the
"warming"
device for this thermal zone is "throttled" to generate as least
heat
as possible. And this is pretty much what the current code has
always
been doing, right?


IMHO, thermal framework should move to a direction where the term
"mitigation" is used rather than cooling or warming. In this case
"cooling dev" and "warming dev" should will become
"temp-mitigating-dev". So going by this, I think what you mention as
option 1 is more suitable where new logic is introduced into the
framework and governors to handle the trip points marked as "cold".

Also in the current set of requirements, we have a few power domain
rails and other resources that are used exclusively in the thermal
framework for warming alone as in they are not used ever for cooling
down a zone. But then one of the requirements we have discussed is
for cpufreq and gpu scaling to be behave as warming devices where
the minimum operating point/ voltage of the relevant cpu/gpu is
restricted.
So in this case, Daniel had this suggestion of introducing negative
states for presently what is defined as cooling devices. So cooling
dev
/ temp-mitigation-dev states can range from say -3 to 5 with 0 as
the
good state where no mitigation is happening. This is an interesting
idea
though I have not proto-typed it yet.

Agreed. If some devices support both "cooling" and "warning", we should
have only one "temp-mitigating-dev" instead.


I can not say which one is better for now as I don't have the
background of this requirement. It's nice that Thara sent this RFC
series for discussion, but from upstream point of view, I'd prefer
to
see a full stack solution, before taking any code.

We had done a session at ELC on this requirement. Here is the link
to
the presentation. Hopefully it gives you some back ground on this.

yes, it helps. :)


https://elinux.org/images/f/f7/ELC-2020-Thara-Ram-Linux-Kernel-Thermal-Warming.pdf

I have sent across some patches for introducing a generic power
domain
warming device which is under review by Daniel.

So how do you want to proceed on this? Can you elaborate a bit more
on
what you mean by a full stack solution.

I mean, the patches, and the idea look good to me, just with some minor
comments. But applying this patch series, alone, does not bring us
anything because we don't have a thermal zone driver that supports cold
trip point, right?
I'd like to see this patch series together with the support in
thermal_core/governors and real users like updated/new thermal
zone/cdev drivers that supports the cold trip point and warming
actions.
Or else I've the concern that this piece of code may be changed back
and forth when prototyping the rest of the support.

Got it! I will try to include more pieces in the next version.


thanks,
rui


--
Warm Regards
Thara