Re: RFC: device thermal limits represented in device tree nodes

From: Eduardo Valentin
Date: Wed Aug 07 2013 - 16:20:00 EST


Pawel, all,

On 06-08-2013 07:14, Pawel Moll wrote:
> Apologies about the delay, I was "otherwise engaged" for a week...
>

I do also excuse for my delay, as I was also "engaged" for a week or so.

> I hope you haven't lost all motivation to work on this subject, as it's
> really worth the while!

Not really! quite the opposite. Although I was looking at some other
stuff, I got this series also tested on different boards and wrote down
a couple of improvements I will be working in the coming days. Indeed,
it is worth moving forward with this work.

>
> On Fri, 2013-07-26 at 20:55 +0100, Eduardo Valentin wrote:
>> On 25-07-2013 13:33, Pawel Moll wrote:
>>> On Thu, 2013-07-25 at 18:20 +0100, Eduardo Valentin wrote:
>>>>>> thermal_zone {
>>>>>> type = "CPU";
>>>>>
>>>>> So what does this exactly mean? What is so special about CPU? What other
>>>>> types you've got there? (Am I just lazy not looking at the numerous
>>>>> links you provided? ;-)
>>>>
>>>> Hehehe. OK. Type is supposed to describe what your zone is representing.
>>>
>>> As in "a name"? So, for example "The board", "PSU"? What I meant to ask
>>> was: does the string carry any meaning?
>
> You haven't commended on this...

The string is supposed to carry meaning, yes. Couple of common used:
CPU, GPU, PCB, LCD

>
>>>>>> trips {
>>>>>> alert@100000{
>>>>>> temperature = <100000>; /* milliCelsius
>>>>>> hysteresis = <2000>; /* milliCelsius */
>>>>>> type = <THERMAL_TRIP_PASSIVE>;
>>>>>> };
>>>>>> crit@125000{
>>>>>> temperature = <125000>; /* milliCelsius
>>>>>> hysteresis = <2000>; /* milliCelsius */
>>>>>> type = <THERMAL_TRIP_CRITICAL>;
>>>>>> };
>>>>>> };
>>>>>> bind_params {
>>>>>> action@0{
>>>>>> cooling_device = "thermal-cpufreq";
>>>>>
>>>>> Why is it a string? It seems very Linux-y... (cpufreq) Is there any
>>>>> particular reason not to have phandles to the fans that have any impact
>>>>> on the zone?
>>>>
>>>> Because fans are not the only way to cool your system, specially those
>>>> systems that don't feature fans. Managing the speed of your CPU is one
>>>> example of lowering temperature without fans. Managing the load on your
>>>> system is another way. These are obviously, virtual concepts. And
>>>> because we have physical ways and logical ways to cool the zone, then I
>>>> didnt put a phandle to a device there.
>>>
>>> "virtual concepts"... This is where my problem lies... It's not hardware
>>> so it doesn't seem to belong in the tree at the first sight. Shouldn't
>>
>> Yeah, in fact, this is exactly the point that creates most of the
>> disagreement. You may check Guenter's arguments against this proposal
>> (in my original RFC email, there is a link to it).
>>
>> Well, if one don't want to see this as a 'virtual concept' it could say
>> the cooling device is the cpu itself:
>> cooling_device = <&cpu0>;
>
> Would this create any particular problem at the driver/framework side?

In this case, I believe CPUfreq driver must be thermal aware in this
case. And we need to cook a way to, whenever there is such link, the
cpufreq driver instantiates the cooling mechanism. But I need to think a
little bit more on this, will come back on this point soon.

>
>>> it focus on "physical data" instead? As in: point at devices that have
>>> some impact on the conditions? For example, you can say "please, do the
>>> right thing to cool your environment down" to both CPU and fan, can't
>>> you? The "cooling driver" for the CPU would know that it has to slow
>>> down, while a driver for the fan would know that it has to speed up ;-)
>>>
>>> What I'm trying to say is that in my opinion the tree should simply link
>>> the object, the sensor and the actuator. Nothing more, nothing less.
>>
>> OK. I think it would be a little unfair to have only links, without
>> describing what this link is supposed to be or how it is supposed to be
>> used. In previous discussions, I have mentioned two similar examples
>> already existing in DT. Here are they: regulator bindings, one does not
>> describe only which device connects to which regulator, but also needs
>> to describe, voltage limits, current limits, offsets, and other
>> properties. And an existing 'virtual concept' would be predefined CPU
>> OPPs, that feed the opp layer. Those are configurations of the hardware
>> that define a 'virtual' concept of operating point.
>>
>> So, saying we need to describe only physical connections or touchable
>> things would be a little unfair, IMO. Besides, thermal is still physical
>> :-).
>
> Believe me, I'm trying to be as fair as possible :-) and I see a lot of
> value in describing the thermal properties of the platforms in the tree.
> It's just that we really want to focus on describing the hardware, not
> policies. And as you have already spent so much time on the matter, you
> are in the best position to find the best set of *physical* properties
> that would allow to make the right decision in the code. Could you,
> please, try to make one step back and have another look at the problem?
> What input data (as in: numbers :-) would you need to get what you want?
> Are those numbers characteristic to the specific device (they probably
> should live in the driver than) or to the board/platform (tree without a
> doubt).

Ok. My point was just that linking objects without telling when (in
temperature domain) to start using this link, may be incomplete, because
trip points are really HW dependent, because your power dissipation
profile changes from HW design to HW design. In other words, one can
say, "use device fan to cool the GPU", but depending on your GPU, you
may start using your fan when the sensor is 100C, 85C, 110C (just
picking numbers), it is really HW dependent. Besides, the same IP may be
used immersed in different ambient condition, which may cause variance
on its leakage level, and thus changes its thermal dissipation profile,
resulting in different trip points. Thus, mapping all this HW
characteristics inside specific drivers does not sound the right path to
me. That is why I believe pointing which trips to use is part of HW
description.

I must agree that, saying which governor to use, then again, is pure
policy definition though.

I will go through our discussion, consider the points risen and repost
the series as soon as possible. I will not let this one fall into the
cracks, no worries.

Thanks for your inputs, BTW.

>
> Thanks!
>
> Pawel
>
>
>
>


--
You have got to be excited about what you are doing. (L. Lamport)

Eduardo Valentin

Attachment: signature.asc
Description: OpenPGP digital signature