Re: [RFC] ARM: dts: omap36xx: Enable thermal throttling

From: Daniel Lezcano
Date: Sat Sep 14 2019 - 05:54:12 EST




Hi Nikolauss,

On 13/
09/2019 22:34, H. Nikolaus Schaller wrote:

[ ... ]

>> The governor continues to read the temperature and see the temperature
>> decrease, it does nothing.
>
> Ah, I think our misunderstanding is that the govenor "enables" and
> "disables" a set of OPPs. Rather it goes down or up in the list if
> above or below a trip point.

Right.

>> The governor continues to read the temperature, see the temperature
>> decreases and is below 75ÂC, it decrease the state (state=>1), the OPP
>> change to 2.36GHz.
>>
>> The temperature then increases, etc ...
>>
>> Actually the governors do more than that but it is for the example.
>>
>> So it is a bad idea to set boundaries for the cooling device state as
>> that may prevent the governor to take the right decision for the cooling
>> effect. Imagine in the example above, we set the max state to 1 for the
>> cooling device, that would mean the governor won't be able to stop the
>> temperature increasing, thus ending up to a hard reboot.
>
> Well, the data sheet only requires that the high speed OPPs are only
> used below 90ÂC. If I understand correctly if we set the trip point to
> 90ÂC it will simply go down through the full list of OPPs. This will
> clearly avoid the high speed OPPs (and potentially some low-speed
> ones, but that does not harm).

Yes, right.

> So our approach "how to make it disable these two OPPs" seems to be
> wrong. Rather, we have to think "make sure the temperature
> stays below 90ÂC".

Your approach is not wrong, it proves there is a limitation in the
thermal/cpufreq framework.

There is the 'turbo mode' [1] which describes exactly what you want but
I'm not sure it is fully implemented.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/opp/opp.txt#n138

> And is it true that we do not have to define mapping for the "critical"
> trip points?

Right, you don't have to, it is optional. But the critical trip point
will make your system to shutdown in case something is going wrong, for
example an external heating source, like the sun or whatever. It is good
to set a high temperature to force the shutdown .
Usually it is below the hardware reset temperature.


>>>> Now the different combinations:
>>>>
>>>> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> the governor will use the state
>>>> 0 to 7.
>>>>
>>>> <&cpu THERMAL_NO_LIMIT 2> the governor will use the state 0 to 2
>>>
>>> What would be the difference between <&cpu THERMAL_NO_LIMIT 2> and
>>> <&cpu 0 2> ?
>>> (if there is any)
>>
>> There is no difference.
>>
>>
>>>> <&cpu 1 2> the governor will use the state 1 and 2. That means there is
>>>> always the cooling effect as the governor won't set it to zero thus
>>>> stopping the mitigation.
>>>
>>> For the purposes of the board in question, we have 4 operating points,
>>> 300MHz, 600MHz, 800MHz and 1GHz. Once the board reaches 90C, we need
>>> them to cease operation at 800MHz and 1GHz and only permit operation
>>> at 300MHz and 600MHz. I am going under the assumption that the cpu
>>> index[0] would be for 300MHz, index[1] = 600MHz, etc.
>>>
>>> If I am interpreting your comment correctly, I should set <&cpu
>>> THERMAL_NO_LIMIT 2> which would allow it to either not cool and run up
>>> to 600MHz and not exceed, is that correct?
>>
>> Nope, it will mean the cooling device can only reduce to 800MHz and to
>> 600MHz to mitigate.
>>
>> Actually the thermal framework neither the kernel are designed to handle
>> this case. They assume the OPPs are stable whatever the thermal situation.
>>
>> That is the reason why I think it is a very interesting use case because
>> it introduces a temperature constraint in addition to a duration for a
>> certain OPP. IMO, that could be an extension of the turbo-mode.
>>
>> With what we have now, I doubt it is feasible.
>>
>> The best we can do is preventing to reach the 90ÂC, so we remove the OPP
>> temperature constraint. I suppose 85ÂC is a safe temperature to stick on.
>>
>> And in order to let the governor have free hand.
>>
>> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT>
>>
>> I don't think that will have a significant impact on performances
>> compared to be able to run at a higher temperature with less OPPs.
>>
>>
>>>> Does it clarify the DT spec?
>>>>
>>>
>>> I think your reply to my inquiry might. If possible, it would be nice
>>> to get this documented into the bindings doc for others in the future.
>>> I can do it, but someone with a better understanding of the concept
>>> maybe more qualified. I can totally understand why some may want to
>>> integrate this into their SoC device trees to slow the processor when
>>> hot.
>>>
>>> Thank you for taking the time to review this. I appreciate it.
>>>
>>> adam
>
> BR,
> Nikolaus
>


--
<http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog