Re: [PATCH 3/3] arm64: zynqmp: Add thermal zones
From: Sean Anderson
Date: Tue Sep 03 2024 - 10:29:57 EST
On 9/3/24 04:32, Thangaraj, Senthil Nathan wrote:
> Hi Sean,
>
> Please find my review comments inline.
>
> Thanks,
> Senthil.
>
>> -----Original Message-----
>> From: Sean Anderson <sean.anderson@xxxxxxxxx>
>> Sent: Monday, August 12, 2024 2:51 PM
>> To: Simek, Michal <michal.simek@xxxxxxx>; linux-arm-
>> kernel@xxxxxxxxxxxxxxxxxxx
>> Cc: Rob Herring <robh@xxxxxxxxxx>; Conor Dooley <conor+dt@xxxxxxxxxx>;
>> Krzysztof Kozlowski <krzk+dt@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx;
>> devicetree@xxxxxxxxxxxxxxx; Sean Anderson <sean.anderson@xxxxxxxxx>
>> Subject: [PATCH 3/3] arm64: zynqmp: Add thermal zones
>>
>> Add some thermal trip points. We can't undervolt the CPUs to save power
>> when we underclock them, so there isn't really a point in throttling them until
>> we are about to overheat. As such, the passive trip point is right below the
>> critical trip point.
>>
>> The critical trip point is the extended/industrial-grade maximum junction
>> temperature of 100C minus the maximum temperature sensor error of 3.5C
>> (in the range -55C to 110C). Automotive- and military-grade parts can go up
>> to 125C, but as far as I can tell there is no way to detect them at runtime.
>> Userspace can adjust the trip points at runtime, but this may not be viable
>> when booting above 100C. I think it's reasonable to ask automotive/military
>> users to edit their device trees to bump the trip points, but if that proves to be
>> an issue we can always go with no default temperatures. However, that
>> wouldn't be too nice for the majority of extended/industrial users.
>>
>> Signed-off-by: Sean Anderson <sean.anderson@xxxxxxxxx>
>> ---
>>
>> arch/arm64/boot/dts/xilinx/zynqmp.dtsi | 86
>> ++++++++++++++++++++++++++
>> 1 file changed, 86 insertions(+)
>>
>> diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
>> b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
>> index 21c1adbaf35f..467f084c6469 100644
>> --- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
>> +++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
>> @@ -18,6 +18,7 @@
>> #include <dt-bindings/interrupt-controller/irq.h>
>> #include <dt-bindings/power/xlnx-zynqmp-power.h>
>> #include <dt-bindings/reset/xlnx-zynqmp-resets.h>
>> +#include <dt-bindings/thermal/thermal.h>
>>
>> / {
>> compatible = "xlnx,zynqmp";
>> @@ -36,6 +37,7 @@ cpus {
>> #size-cells = <0>;
>>
>> cpu0: cpu@0 {
>> + #cooling-cells = <2>;
>> compatible = "arm,cortex-a53";
>> device_type = "cpu";
>> enable-method = "psci";
>> @@ -46,6 +48,7 @@ cpu0: cpu@0 {
>> };
>>
>> cpu1: cpu@1 {
>> + #cooling-cells = <2>;
>> compatible = "arm,cortex-a53";
>> device_type = "cpu";
>> enable-method = "psci";
>> @@ -56,6 +59,7 @@ cpu1: cpu@1 {
>> };
>>
>> cpu2: cpu@2 {
>> + #cooling-cells = <2>;
>> compatible = "arm,cortex-a53";
>> device_type = "cpu";
>> enable-method = "psci";
>> @@ -66,6 +70,7 @@ cpu2: cpu@2 {
>> };
>>
>> cpu3: cpu@3 {
>> + #cooling-cells = <2>;
>> compatible = "arm,cortex-a53";
>> device_type = "cpu";
>> enable-method = "psci";
>> @@ -406,6 +411,87 @@ ams {
>> <&xilinx_ams 27>, <&xilinx_ams 28>, <&xilinx_ams
>> 29>;
>> };
>>
>> +
>> + tsens_apu: thermal-sensor-apu {
>> + compatible = "generic-adc-thermal";
>> + #thermal-sensor-cells = <0>;
>> + io-channels = <&xilinx_ams 7>;
>> + io-channel-names = "sensor-channel";
>> + };
>> +
>> + tsens_rpu: thermal-sensor-rpu {
>> + compatible = "generic-adc-thermal";
>> + #thermal-sensor-cells = <0>;
>> + io-channels = <&xilinx_ams 8>;
>> + io-channel-names = "sensor-channel";
>> + };
>> +
>> + tsens_pl: thermal-sensor-pl {
>> + compatible = "generic-adc-thermal";
>> + #thermal-sensor-cells = <0>;
>> + io-channels = <&xilinx_ams 20>;
>> + io-channel-names = "sensor-channel";
>> + };
>> +
>> + thermal-zones {
>> + apu-thermal {
>> + polling-delay-passive = <1000>;
>> + polling-delay = <5000>;
>
> How did we arrive at these delays, 1000 and 5000 ? could you please clarify ?
They're just arbitrary. Feel free to suggest other numbers. I could find
no guidance on this matter (or anything else thermal-related).
>> + thermal-sensors = <&tsens_apu>;
>> +
>> + trips {
>> + apu_passive: passive {
>> + temperature = <93000>;
>> + hysteresis = <3500>;
>> + type = "passive";
>> + };
>> +
>> + apu_critical: critical {
>> + temperature = <96500>;
>> + hysteresis = <3500>;
>> + type = "critical";
>> + };
>> + };
>> +
>> + cooling-maps {
>> + map {
>> + trip = <&apu_passive>;
>> + cooling-device =
>> + <&cpu0 THERMAL_NO_LIMIT
>> THERMAL_NO_LIMIT>,
>> + <&cpu1 THERMAL_NO_LIMIT
>> THERMAL_NO_LIMIT>,
>> + <&cpu2 THERMAL_NO_LIMIT
>> THERMAL_NO_LIMIT>,
>> + <&cpu3 THERMAL_NO_LIMIT
>> THERMAL_NO_LIMIT>;
>> + };
>> + };
>> + };
>> +
>> + rpu-thermal {
>> + polling-delay = <10000>;
>
> Same questions, how did we come up with number 10000
>
>> + thermal-sensors = <&tsens_rpu>;
>> +
>> + trips {
>> + critical {
>> + temperature = <96500>;
>> + hysteresis = <3500>;
>> + type = "critical";
>> + };
>
> Any reason why we haven't defined for passive trip point for RPU ?
What is there to do? It is up to the RPU software to do something if the
RPU is going to overheat. But the more-likely occurrence is that the APU
is overheating and the heat is spreading to the RPU. In which case, the
APU passive trip point will handle things.
>> + };
>> + };
>> +
>> + pl-thermal {
>> + polling-delay = <10000>;
>> + thermal-sensors = <&tsens_pl>;
>> +
>> + trips {
>> + critical {
>> + temperature = <96500>;
>> + hysteresis = <3500>;
>> + type = "critical";
>> + };
>> + };
> Same question, any reason why we haven't defined for passive trip point for PL ?
>> + };
>> + };
>> +
>> amba: axi {
>> compatible = "simple-bus";
>> bootph-all;
>> --
>> 2.35.1.1320.gc452695387.dirty
>