RE: [PATCH 3/3] arm64: zynqmp: Add thermal zones
From: Thangaraj, Senthil Nathan
Date: Mon Sep 09 2024 - 14:29:09 EST
Hi Sean,
Thank you for your response.
Unfortunately, I’m not the right person to recommend the thermal polling delay. I’m looping Matthew into this thread for further assistance.
Hi Matthew,
Could you or your team please provide a recommendation on this?
Thanks,
Senthil
> -----Original Message-----
> From: Sean Anderson <sean.anderson@xxxxxxxxx>
> Sent: Tuesday, September 3, 2024 7:28 AM
> To: Thangaraj, Senthil Nathan <SenthilNathan.Thangaraj@xxxxxxx>; Simek,
> Michal <michal.simek@xxxxxxx>; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> Cc: Rob Herring <robh@xxxxxxxxxx>; Conor Dooley <conor+dt@xxxxxxxxxx>;
> Krzysztof Kozlowski <krzk+dt@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx;
> devicetree@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH 3/3] arm64: zynqmp: Add thermal zones
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> On 9/3/24 04:32, Thangaraj, Senthil Nathan wrote:
> > Hi Sean,
> >
> > Please find my review comments inline.
> >
> > Thanks,
> > Senthil.
> >
> >> -----Original Message-----
> >> From: Sean Anderson <sean.anderson@xxxxxxxxx>
> >> Sent: Monday, August 12, 2024 2:51 PM
> >> To: Simek, Michal <michal.simek@xxxxxxx>; linux-arm-
> >> kernel@xxxxxxxxxxxxxxxxxxx
> >> Cc: Rob Herring <robh@xxxxxxxxxx>; Conor Dooley
> >> <conor+dt@xxxxxxxxxx>; Krzysztof Kozlowski <krzk+dt@xxxxxxxxxx>;
> >> linux-kernel@xxxxxxxxxxxxxxx; devicetree@xxxxxxxxxxxxxxx; Sean
> >> Anderson <sean.anderson@xxxxxxxxx>
> >> Subject: [PATCH 3/3] arm64: zynqmp: Add thermal zones
> >>
> >> Add some thermal trip points. We can't undervolt the CPUs to save
> >> power when we underclock them, so there isn't really a point in
> >> throttling them until we are about to overheat. As such, the passive
> >> trip point is right below the critical trip point.
> >>
> >> The critical trip point is the extended/industrial-grade maximum
> >> junction temperature of 100C minus the maximum temperature sensor
> >> error of 3.5C (in the range -55C to 110C). Automotive- and
> >> military-grade parts can go up to 125C, but as far as I can tell there is no
> way to detect them at runtime.
> >> Userspace can adjust the trip points at runtime, but this may not be
> >> viable when booting above 100C. I think it's reasonable to ask
> >> automotive/military users to edit their device trees to bump the trip
> >> points, but if that proves to be an issue we can always go with no
> >> default temperatures. However, that wouldn't be too nice for the majority
> of extended/industrial users.
> >>
> >> Signed-off-by: Sean Anderson <sean.anderson@xxxxxxxxx>
> >> ---
> >>
> >> arch/arm64/boot/dts/xilinx/zynqmp.dtsi | 86
> >> ++++++++++++++++++++++++++
> >> 1 file changed, 86 insertions(+)
> >>
> >> diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
> >> b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
> >> index 21c1adbaf35f..467f084c6469 100644
> >> --- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
> >> +++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
> >> @@ -18,6 +18,7 @@
> >> #include <dt-bindings/interrupt-controller/irq.h>
> >> #include <dt-bindings/power/xlnx-zynqmp-power.h>
> >> #include <dt-bindings/reset/xlnx-zynqmp-resets.h>
> >> +#include <dt-bindings/thermal/thermal.h>
> >>
> >> / {
> >> compatible = "xlnx,zynqmp";
> >> @@ -36,6 +37,7 @@ cpus {
> >> #size-cells = <0>;
> >>
> >> cpu0: cpu@0 {
> >> + #cooling-cells = <2>;
> >> compatible = "arm,cortex-a53";
> >> device_type = "cpu";
> >> enable-method = "psci"; @@ -46,6 +48,7 @@ cpu0:
> >> cpu@0 {
> >> };
> >>
> >> cpu1: cpu@1 {
> >> + #cooling-cells = <2>;
> >> compatible = "arm,cortex-a53";
> >> device_type = "cpu";
> >> enable-method = "psci"; @@ -56,6 +59,7 @@ cpu1:
> >> cpu@1 {
> >> };
> >>
> >> cpu2: cpu@2 {
> >> + #cooling-cells = <2>;
> >> compatible = "arm,cortex-a53";
> >> device_type = "cpu";
> >> enable-method = "psci"; @@ -66,6 +70,7 @@ cpu2:
> >> cpu@2 {
> >> };
> >>
> >> cpu3: cpu@3 {
> >> + #cooling-cells = <2>;
> >> compatible = "arm,cortex-a53";
> >> device_type = "cpu";
> >> enable-method = "psci"; @@ -406,6 +411,87 @@ ams
> >> {
> >> <&xilinx_ams 27>, <&xilinx_ams 28>, <&xilinx_ams
> >> 29>;
> >> };
> >>
> >> +
> >> + tsens_apu: thermal-sensor-apu {
> >> + compatible = "generic-adc-thermal";
> >> + #thermal-sensor-cells = <0>;
> >> + io-channels = <&xilinx_ams 7>;
> >> + io-channel-names = "sensor-channel";
> >> + };
> >> +
> >> + tsens_rpu: thermal-sensor-rpu {
> >> + compatible = "generic-adc-thermal";
> >> + #thermal-sensor-cells = <0>;
> >> + io-channels = <&xilinx_ams 8>;
> >> + io-channel-names = "sensor-channel";
> >> + };
> >> +
> >> + tsens_pl: thermal-sensor-pl {
> >> + compatible = "generic-adc-thermal";
> >> + #thermal-sensor-cells = <0>;
> >> + io-channels = <&xilinx_ams 20>;
> >> + io-channel-names = "sensor-channel";
> >> + };
> >> +
> >> + thermal-zones {
> >> + apu-thermal {
> >> + polling-delay-passive = <1000>;
> >> + polling-delay = <5000>;
> >
> > How did we arrive at these delays, 1000 and 5000 ? could you please clarify
> ?
>
> They're just arbitrary. Feel free to suggest other numbers. I could find no
> guidance on this matter (or anything else thermal-related).
>
> >> + thermal-sensors = <&tsens_apu>;
> >> +
> >> + trips {
> >> + apu_passive: passive {
> >> + temperature = <93000>;
> >> + hysteresis = <3500>;
> >> + type = "passive";
> >> + };
> >> +
> >> + apu_critical: critical {
> >> + temperature = <96500>;
> >> + hysteresis = <3500>;
> >> + type = "critical";
> >> + };
> >> + };
> >> +
> >> + cooling-maps {
> >> + map {
> >> + trip = <&apu_passive>;
> >> + cooling-device =
> >> + <&cpu0 THERMAL_NO_LIMIT
> >> THERMAL_NO_LIMIT>,
> >> + <&cpu1 THERMAL_NO_LIMIT
> >> THERMAL_NO_LIMIT>,
> >> + <&cpu2 THERMAL_NO_LIMIT
> >> THERMAL_NO_LIMIT>,
> >> + <&cpu3 THERMAL_NO_LIMIT
> >> THERMAL_NO_LIMIT>;
> >> + };
> >> + };
> >> + };
> >> +
> >> + rpu-thermal {
> >> + polling-delay = <10000>;
> >
> > Same questions, how did we come up with number 10000
> >
> >> + thermal-sensors = <&tsens_rpu>;
> >> +
> >> + trips {
> >> + critical {
> >> + temperature = <96500>;
> >> + hysteresis = <3500>;
> >> + type = "critical";
> >> + };
> >
> > Any reason why we haven't defined for passive trip point for RPU ?
>
> What is there to do? It is up to the RPU software to do something if the RPU is
> going to overheat. But the more-likely occurrence is that the APU is
> overheating and the heat is spreading to the RPU. In which case, the APU
> passive trip point will handle things.
>
> >> + };
> >> + };
> >> +
> >> + pl-thermal {
> >> + polling-delay = <10000>;
> >> + thermal-sensors = <&tsens_pl>;
> >> +
> >> + trips {
> >> + critical {
> >> + temperature = <96500>;
> >> + hysteresis = <3500>;
> >> + type = "critical";
> >> + };
> >> + };
> > Same question, any reason why we haven't defined for passive trip point for
> PL ?
> >> + };
> >> + };
> >> +
> >> amba: axi {
> >> compatible = "simple-bus";
> >> bootph-all;
> >> --
> >> 2.35.1.1320.gc452695387.dirty
> >