Re: [RFC] ARM: dts: omap36xx: Enable thermal throttling

From: H. Nikolaus Schaller
Date: Fri Sep 13 2019 - 16:35:01 EST


Hi Daniel,

> Am 13.09.2019 um 22:11 schrieb Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>:
>
> On 13/09/2019 20:46, Adam Ford wrote:
>> On Fri, Sep 13, 2019 at 12:18 PM Daniel Lezcano
>> <daniel.lezcano@xxxxxxxxxx> wrote:
>>>
>>> On 13/09/2019 18:51, H. Nikolaus Schaller wrote:
>>>
>>> [ ... ]
>>>
>>>>> Good news (I think)
>>>>>
>>>>> With cooling-device = <&cpu 1 2> setup, I was able to ask the max
>>>>> frequency and it returned 600MHz.
>>>>>
>>>>> # cat /sys/devices/virtual/thermal/thermal_zone0/temp
>>>>> 58500
>>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
>>>>> 300000 600000 800000
>>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_m
>>>>> scaling_max_freq scaling_min_freq
>>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
>>>>> 600000
>>>>
>>>> looks good!
>>>> But we have to understand what the <&cpu 1 2> exactly means...
>>>>
>>>> Hopefully someone reading your RFCv2 can answer...
>>>
>> Daniel,
>>
>> Thank you for replying.
>>
>>> I may have missed the question :)
>>>
>>> These are the states allowed for the cooling device (the one you can see
>>> in the /sys/class/thermal/cooling_device0/max_state. As the logic is
>>> inverted for cpufreq, that can be confusing.
>>
>> I think that's what has be confused.
>>
>>>
>>> If it was a fan with, let's say 5 speeds, you would use <&fan 0 5>, so
>>> when the mitigation begins the cooling device state is 0 and then the
>>> thermal governor increase the state until it sees a cooling effect.
>>>
>>> If <&fan 0 2> is set, the governor won't set a state above 2 even if the
>>> temperature increases.
>>
>> I am not sure I know what you mean by 'state' in this context.
>
> A thermal zone is managed by the thermal framework as the following:
> - a sensor
> - a governor
> - a cooling device
>
> The governor gets the temperature via the sensor and depending on the
> temperature it will increase or decrease the cooling effect of the
> cooling device. With a fan, that means it will increase or decrease its
> speed. With cpufreq, it will decrease or increase the OPP.
>
> These are discrete values the governor will use to set the cooling
> effect. The state is one of these value (the current speed or the
> current OPP index).
>
> Depending on the cooling device, the number of states are different.
>
> In the context above, the fan cooling device can be stopped (state=0),
> running (state=1), running faster (state=2).
>
> As the node tells to use no more than 2, then the governor will never go
> to running much faster (state=3). (That's an example).
>
>>> When the cooling driver is able to return the number of states it
>>> supports, it is safe to set the states to THERMAL_NO_LIMIT and let the
>>> governor to find the balance point.
>>
>> If the cooling driver is using cpufreq, is the number of supported
>> states equal to the number of operating points given to cpufreq?
>
> Yes, absolutely if THERMAL_NO_LIMIT is set [1] (which is what is done
> most of the cases). Otherwise it will use the boundaries set in <&cpu
> limit_low limit_high>
>
> When changing the limits, a state=1 has a different meaning.
>
> For example: 7 OPPs available
>
> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> : state=[0..7]
>
> <&cpu 0 2> : state=[0..2] (1, 2)
>
> <&cpu 5 7> : state=[0..3] (5, 6, 7)
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/thermal/cpu_cooling.c#n334
>
>>> Now if the cooling device is cpufreq, the state order is inverted,
>>> because the cooling effects happens when decreasing the OPP.
>>>
>>> If the boards support 7 OPPs, the state 0 is 7 - 0, so no mitigation, if
>>> the state is 1, the cpufreq is throttle to the 6th OPP, 2 to the 5th OPP
>>> etc.
>>
>> I am not sure how the state would be set to 2.
>
> That is a governor decision. Let me give an example with a hikey960
> board which has very fast temperature transitions, so it is simpler to
> illustrate the behavior. The trip point is 75ÂC.
>
> Imagine the CPU gets loaded 100%, the cpufreq sets the OPP to the max
> (2.36GHz), as the temperature is still under 75ÂC, there is no
> mitigation yet, so the cooling device state is 0.
>
> In a very few seconds the temperature reaches 75ÂC, that trigger the
> monitoring of the thermal zone and the mitigation begins, then the
> temperature continues to increase very quickly to 78ÂC, the governor see
> we are above the trip point and increment the cooling device state
> (state=>1). That leads to an OPP change from 2.36GHz to 2.11GHz.
>
> The governor continues to read the temperature and see the temperature
> is still increasing (even if it is that happens more slowly), so it
> increases the state again (state=>2). That leads to an OPP change from
> 2.11GHz to 1.8GHz.
>
> The governor continues to read the temperature and see the temperature
> decrease, it does nothing.

Ah, I think our misunderstanding is that the govenor "enables" and
"disables" a set of OPPs. Rather it goes down or up in the list if
above or below a trip point.

>
> The governor continues to read the temperature, see the temperature
> decreases and is below 75ÂC, it decrease the state (state=>1), the OPP
> change to 2.36GHz.
>
> The temperature then increases, etc ...
>
> Actually the governors do more than that but it is for the example.
>
> So it is a bad idea to set boundaries for the cooling device state as
> that may prevent the governor to take the right decision for the cooling
> effect. Imagine in the example above, we set the max state to 1 for the
> cooling device, that would mean the governor won't be able to stop the
> temperature increasing, thus ending up to a hard reboot.

Well, the data sheet only requires that the high speed OPPs are only
used below 90ÂC. If I understand correctly if we set the trip point to
90ÂC it will simply go down through the full list of OPPs. This will
clearly avoid the high speed OPPs (and potentially some low-speed
ones, but that does not harm).

So our approach "how to make it disable these two OPPs" seems to be
wrong. Rather, we have to think "make sure the temperature
stays below 90ÂC".

And is it true that we do not have to define mapping for the "critical"
trip points?

>
>>> Now the different combinations:
>>>
>>> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> the governor will use the state
>>> 0 to 7.
>>>
>>> <&cpu THERMAL_NO_LIMIT 2> the governor will use the state 0 to 2
>>
>> What would be the difference between <&cpu THERMAL_NO_LIMIT 2> and
>> <&cpu 0 2> ?
>> (if there is any)
>
> There is no difference.
>
>
>>> <&cpu 1 2> the governor will use the state 1 and 2. That means there is
>>> always the cooling effect as the governor won't set it to zero thus
>>> stopping the mitigation.
>>
>> For the purposes of the board in question, we have 4 operating points,
>> 300MHz, 600MHz, 800MHz and 1GHz. Once the board reaches 90C, we need
>> them to cease operation at 800MHz and 1GHz and only permit operation
>> at 300MHz and 600MHz. I am going under the assumption that the cpu
>> index[0] would be for 300MHz, index[1] = 600MHz, etc.
>>
>> If I am interpreting your comment correctly, I should set <&cpu
>> THERMAL_NO_LIMIT 2> which would allow it to either not cool and run up
>> to 600MHz and not exceed, is that correct?
>
> Nope, it will mean the cooling device can only reduce to 800MHz and to
> 600MHz to mitigate.
>
> Actually the thermal framework neither the kernel are designed to handle
> this case. They assume the OPPs are stable whatever the thermal situation.
>
> That is the reason why I think it is a very interesting use case because
> it introduces a temperature constraint in addition to a duration for a
> certain OPP. IMO, that could be an extension of the turbo-mode.
>
> With what we have now, I doubt it is feasible.
>
> The best we can do is preventing to reach the 90ÂC, so we remove the OPP
> temperature constraint. I suppose 85ÂC is a safe temperature to stick on.
>
> And in order to let the governor have free hand.
>
> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT>
>
> I don't think that will have a significant impact on performances
> compared to be able to run at a higher temperature with less OPPs.
>
>
>>> Does it clarify the DT spec?
>>>
>>
>> I think your reply to my inquiry might. If possible, it would be nice
>> to get this documented into the bindings doc for others in the future.
>> I can do it, but someone with a better understanding of the concept
>> maybe more qualified. I can totally understand why some may want to
>> integrate this into their SoC device trees to slow the processor when
>> hot.
>>
>> Thank you for taking the time to review this. I appreciate it.
>>
>> adam

BR,
Nikolaus