Re: [PATCH] Fix mt7622.dtsi thermal cpu
From: Eric Woudstra
Date: Thu Jun 24 2021 - 09:29:45 EST
The SOC runs unthrotlled slowly to 80 degrees. This takes minutes. Polling interval 1 second or less does not matter much when looking at these temperature rise times
After that in more then an hour it slowly creeps up to 85. I believe the design is so that the SOC, under normal circumstances, can run at 1.35 GHz without throttling frequency, without heatsink. It just needs a safeguard for different circumstances.
Most of these SOCs can also run in industrial grade circumstances, which means up to 85 degrees ambient temperature already . If not industrial then this would be 60 degrees ambient already
But only someone at Mediatek can confirm this
Get BlueMail for Android
On Jun 24, 2021, 12:21 PM, at 12:21 PM, Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
>On 24/06/2021 11:59, Eric Woudstra wrote:
>>
>> For Marvell:
>>
>>
>https://www.google.com/url?sa=t&source=web&rct=j&url=https://wiki.kobol.io/helios4/files/som/brochure_a38x_microsom_2017-09-05.pdf
>>
>> Armada38x maximum die temperature 115 degrees Celcius. They really
>get hotter then 100.
>>
>> But for mt7622 I cannot find this value
>
>Found that:
>
>https://download.kamami.pl/p579344-MT7622A_Datasheet_for_BananaPi_Only%281%29.pdf
>
>Chapter 3.3 - Thermal Characteristics
>
>Given the values I suggest:
>
> - Passive - 80°C
>
> - Hot - 90°C
>
> - Critical - 100°C
>
>And passive polling set to 250ms.
>
>It sounds like the sensor is not supporting the interrupt mode yet, so
>a
>big gap is needed with the Tj IMO to give the time to detect the trip
>point crossing with the polling.
>
>> Get BlueMail for Android
>>
>> On Jun 23, 2021, 10:08 PM, at 10:08 PM, Daniel Lezcano
><daniel.lezcano@xxxxxxxxxx> wrote:
>>> On 23/06/2021 20:43, Eric Woudstra wrote:
>>>>
>>>> I choose "hot" before, because 87 degrees seems ok to start
>frequency
>>>> throttling. But, yes, it should be passive.
>>>>
>>>> 87 is still quite low if I compare this temperature with the
>>>> wrt3200acm Marvell dual core arm soc. They even went above 100
>>>> degrees so I feel for an arm processor inside a router box it is
>fine
>>>> to use 87 degrees But maybe someone at Mediatek can give some more
>>>> details about operating temperatures.
>>>
>>> Sometimes, the SoC vendor puts a high temperature in the DT just to
>>> export the thermal zone and deal with it from userspace. So putting
>the
>>> high temp allow the userspace (usually a thermal engine - Android
>>> stuff)
>>> to deal with the mitigation without a kernel interaction.
>>>
>>> Having more than 100°C could be this kind of setup. Only the
>operating
>>> temperature from the hardware documentation will tell the safe
>>> temperature for the silicon.
>>>
>>> IMO, 77°C is a good compromise until getting the documented temp.
>87°C
>>> sounds to me a bit too hot.
>>>
>>>> It may be possible to leave the active map in the device tree as
>some
>>>> users of the bananapi might choose to install a fan as it is one of
>>>> the options.
>>>
>>> The active trip only makes sense if the cooling device is a fan (or
>any
>>> active device), so the mapping points to a fan node, like:
>>>
>>>
>https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399-khadas-edge.dtsi#n192
>>>
>>> If there is no such [pwm] fan output on the board, no active trip
>point
>>> should be added.
>>>
>>>> Get BlueMail for Android
>>>>
>>>> On Jun 23, 2021, 5:58 PM, at 5:58 PM, Daniel Lezcano
>>>> <daniel.lezcano@xxxxxxxxxx> wrote:
>>>>> On 23/06/2021 17:35, Eric Woudstra wrote:
>>>>>> It is only useful to set 1 map with the regulated temperature for
>>>>>> cpu frequency throttling. Same as in the kernel document
>>>>>> example.
>>>>>>
>>>>>>
>>>>>> It has no use to set frequency scaling on 2 different
>>>>>> temperature trip points, as the lowest one makes sure the higher
>>>>>> one(s) are never reached.
>>>>>
>>>>> I looked more closely the DT and there is a misunderstanding of
>>>>> the thermal framework in the definition.
>>>>>
>>>>> There is one trip point with the passive type and the cpu cooling
>>>>> device, followed by a second trip point with the active type *but*
>>>>> the same cpu cooling device. That is wrong.
>>>>>
>>>>> And finally, there is the hot trip point as a third mapping and
>>>>> the same cooling device.
>>>>>
>>>>> The hot trip point is only there to notify userspace and let it
>>>>> take an immediate action to prevent an emergency shutdown when
>>>>> reaching the critical temperature.
>>>>>
>>>>>> It can be applied only at 1 trip point. Multiple trip points is
>>>>>> only usefully for fan control to make sure the fan is not too
>>>>>> noisy when it is not necessary to be noisy.
>>>>>>
>>>>>>
>>>>>> The CPU will almost come to a dead stop when it starts to pass
>>>>>> the lowest thermal map with frequency throttling.
>>>>>>
>>>>>> This is why it is a bug and needs a fix, not only adjustment.
>>>>>
>>>>> Yes, you are right. It should be something like (verbatim copy):
>>>>>
>>>>> diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi
>>>>> b/arch/arm64/boot/dts/mediatek/mt7622.dtsi index
>>>>> 890a942ec608..88c81d24f4ff 100644 ---
>>>>> a/arch/arm64/boot/dts/mediatek/mt7622.dtsi +++
>>>>> b/arch/arm64/boot/dts/mediatek/mt7622.dtsi @@ -136,24 +136,18 @@
>>>>> secmon_reserved: secmon@43000000 {
>>>>>
>>>>> thermal-zones { cpu_thermal: cpu-thermal { -
>>>>> polling-delay-passive = <1000>; + polling-delay-passive = <250>;
>
>>>>> polling-delay = <1000>;
>>>>>
>>>>> thermal-sensors = <&thermal 0>;
>>>>>
>>>>> trips { cpu_passive: cpu-passive { - temperature = <47000>; +
>>>>> temperature = <77000>; hysteresis = <2000>; type = "passive"; };
>>>>>
>>>>> - cpu_active: cpu-active { - temperature = <67000>; -
>>>>> hysteresis = <2000>; - type = "active"; - }; - cpu_hot:
>>>>> cpu-hot { temperature = <87000>; hysteresis = <2000>; @@ -173,18
>>>>> +167,6 @@ map0 { cooling-device = <&cpu0 THERMAL_NO_LIMIT
>>>>> THERMAL_NO_LIMIT>, <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; }; -
>
>>>>> - map1 { - trip = <&cpu_active>; - cooling-device =
>>>>> <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, - <&cpu1
>>>>> THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; - }; - - map2 { -
>>>>> trip = <&cpu_hot>; - cooling-device = <&cpu0 THERMAL_NO_LIMIT
>>>>> THERMAL_NO_LIMIT>, - <&cpu1 THERMAL_NO_LIMIT
>>>>> THERMAL_NO_LIMIT>; - }; }; }; };
>>>>>
>>>>>
>>>>> -- <http://www.linaro.org/> Linaro.org │ Open source software for
>>>>> ARM SoCs
>>>>>
>>>>> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
>>>>> <http://twitter.com/#!/linaroorg> Twitter |
>>>>> <http://www.linaro.org/linaro-blog/> Blog
>>>>
>>>
>>>
>>> --
>>> <http://www.linaro.org/> Linaro.org │ Open source software for ARM
>SoCs
>>>
>>> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
>>> <http://twitter.com/#!/linaroorg> Twitter |
>>> <http://www.linaro.org/linaro-blog/> Blog
>>
>
>
>--
><http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
>Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
><http://twitter.com/#!/linaroorg> Twitter |
><http://www.linaro.org/linaro-blog/> Blog