Re: [RFC v2 1/2] ARM: dts: omap3: Add cpu trips and cooling map for omap3 family

From: Adam Ford
Date: Sat Sep 14 2019 - 09:43:09 EST


On Sat, Sep 14, 2019 at 4:20 AM H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> wrote:
>
>
> > Am 13.09.2019 um 17:37 schrieb Adam Ford <aford173@xxxxxxxxx>:
> >
> > The OMAP3530, AM3517 and DM3730 all show thresholds of 90C and 105C
> > depending on commercial or industrial temperature ratings. This
> > patch expands the thermal information to the limits of 90 and 105
> > for alert and critical.
> >
> > For boards who never use industrial temperatures, these can be
> > changed on their respective device trees with something like:
> >
> > &cpu_alert0 {
> > temperature = <85000>; /* millicelsius */
> > };
> >
> > &cpu_crit {
> > temperature = <90000>; /* millicelsius */
> > };
> >
> > Signed-off-by: Adam Ford <aford173@xxxxxxxxx>
> > ---
> > V2: Change the CPU reference to &cpu instead of &cpu0
> >
> > diff --git a/arch/arm/boot/dts/omap3-cpu-thermal.dtsi b/arch/arm/boot/dts/omap3-cpu-thermal.dtsi
> > index 235ecfd61e2d..dfbd0cb0b00b 100644
> > --- a/arch/arm/boot/dts/omap3-cpu-thermal.dtsi
> > +++ b/arch/arm/boot/dts/omap3-cpu-thermal.dtsi
> > @@ -17,4 +17,25 @@ cpu_thermal: cpu_thermal {
> >
> > /* sensor ID */
> > thermal-sensors = <&bandgap 0>;
> > +
> > + cpu_trips: trips {
> > + cpu_alert0: cpu_alert {
> > + temperature = <90000>; /* millicelsius */
> > + hysteresis = <2000>; /* millicelsius */
> > + type = "passive";
> > + };
> > + cpu_crit: cpu_crit {
> > + temperature = <105000>; /* millicelsius */
> > + hysteresis = <2000>; /* millicelsius */
> > + type = "critical";
> > + };
> > + };
> > +
> > + cpu_cooling_maps: cooling-maps {
> > + map0 {
> > + trip = <&cpu_alert0>;
> > + cooling-device =
> > + <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> > + };
> > + };
> > };
> > --
> > 2.17.1
> >
>
> Here is my test log (GTA04A5 with DM3730CBP100).
> "high-load" script is driving the NEON to full power
> and would report calculation errors.
>
> There is no noise visible in the bandgap sensor data
> induced by power supply fluctuations (log shows system
> voltage while charging).
>

Great data!

> root@letux:~# ./high-load -n2
> 100% load stress test for 1 cores running ./neon_loop2
> Sat Sep 14 09:05:50 UTC 2019 65Â 4111mV 1000MHz
> Sat Sep 14 09:05:50 UTC 2019 67Â 4005mV 1000MHz
> Sat Sep 14 09:05:52 UTC 2019 68Â 4000mV 1000MHz
> Sat Sep 14 09:05:53 UTC 2019 68Â 4000mV 1000MHz
> Sat Sep 14 09:05:55 UTC 2019 72Â 3976mV 1000MHz
> Sat Sep 14 09:05:56 UTC 2019 72Â 4023mV 1000MHz
> Sat Sep 14 09:05:57 UTC 2019 72Â 3900mV 1000MHz
> Sat Sep 14 09:05:59 UTC 2019 73Â 4029mV 1000MHz
> Sat Sep 14 09:06:00 UTC 2019 73Â 3988mV 1000MHz
> Sat Sep 14 09:06:01 UTC 2019 73Â 4005mV 1000MHz
> Sat Sep 14 09:06:03 UTC 2019 73Â 4011mV 1000MHz
> Sat Sep 14 09:06:04 UTC 2019 73Â 4117mV 1000MHz
> Sat Sep 14 09:06:06 UTC 2019 73Â 4005mV 1000MHz
> Sat Sep 14 09:06:07 UTC 2019 75Â 3994mV 1000MHz
> Sat Sep 14 09:06:08 UTC 2019 75Â 3970mV 1000MHz
> Sat Sep 14 09:06:09 UTC 2019 75Â 4046mV 1000MHz
> Sat Sep 14 09:06:11 UTC 2019 75Â 4005mV 1000MHz
> Sat Sep 14 09:06:12 UTC 2019 75Â 4023mV 1000MHz
> Sat Sep 14 09:06:14 UTC 2019 75Â 3970mV 1000MHz
> Sat Sep 14 09:06:15 UTC 2019 75Â 4011mV 1000MHz
> Sat Sep 14 09:06:16 UTC 2019 77Â 4017mV 1000MHz
> Sat Sep 14 09:06:18 UTC 2019 77Â 3994mV 1000MHz
> Sat Sep 14 09:06:19 UTC 2019 77Â 3994mV 1000MHz
> Sat Sep 14 09:06:20 UTC 2019 77Â 3988mV 1000MHz
> Sat Sep 14 09:06:22 UTC 2019 77Â 4023mV 1000MHz
> Sat Sep 14 09:06:23 UTC 2019 77Â 4023mV 1000MHz
> Sat Sep 14 09:06:24 UTC 2019 78Â 4005mV 1000MHz
> Sat Sep 14 09:06:26 UTC 2019 78Â 4105mV 1000MHz
> Sat Sep 14 09:06:27 UTC 2019 78Â 4011mV 1000MHz
> Sat Sep 14 09:06:28 UTC 2019 78Â 3994mV 1000MHz
> Sat Sep 14 09:06:30 UTC 2019 78Â 4123mV 1000MHz
> ...
> Sat Sep 14 09:09:57 UTC 2019 88Â 4082mV 1000MHz
> Sat Sep 14 09:09:59 UTC 2019 88Â 4164mV 1000MHz
> Sat Sep 14 09:10:00 UTC 2019 88Â 4058mV 1000MHz
> Sat Sep 14 09:10:01 UTC 2019 88Â 4058mV 1000MHz
> Sat Sep 14 09:10:03 UTC 2019 88Â 4082mV 1000MHz
> Sat Sep 14 09:10:04 UTC 2019 88Â 4058mV 1000MHz
> Sat Sep 14 09:10:06 UTC 2019 88Â 4146mV 1000MHz
> Sat Sep 14 09:10:07 UTC 2019 88Â 4041mV 1000MHz
> Sat Sep 14 09:10:08 UTC 2019 88Â 4035mV 1000MHz
> Sat Sep 14 09:10:10 UTC 2019 88Â 4052mV 1000MHz
> Sat Sep 14 09:10:11 UTC 2019 88Â 4087mV 1000MHz
> Sat Sep 14 09:10:12 UTC 2019 88Â 4152mV 1000MHz
> Sat Sep 14 09:10:14 UTC 2019 88Â 4070mV 1000MHz
> Sat Sep 14 09:10:15 UTC 2019 88Â 4064mV 1000MHz
> Sat Sep 14 09:10:17 UTC 2019 88Â 4170mV 1000MHz
> Sat Sep 14 09:10:18 UTC 2019 88Â 4058mV 1000MHz
> Sat Sep 14 09:10:19 UTC 2019 88Â 4187mV 1000MHz
> Sat Sep 14 09:10:21 UTC 2019 88Â 4093mV 1000MHz
> Sat Sep 14 09:10:22 UTC 2019 88Â 4087mV 1000MHz
> Sat Sep 14 09:10:23 UTC 2019 90Â 4070mV 1000MHz

Should we be a little more conservative? Without knowing the
accuracy, i believe we do not want to run at 800 or 1GHz at 90C, so if
we made this value 89 instead of 90, we would throttle a little more
conservatively.

> Sat Sep 14 09:10:25 UTC 2019 88Â 4123mV 800MHz
> Sat Sep 14 09:10:26 UTC 2019 88Â 4064mV 1000MHz
> Sat Sep 14 09:10:28 UTC 2019 90Â 4058mV 1000MHz

Again here, I interpret the data sheet correctly, we're technically out of spec

> Sat Sep 14 09:10:29 UTC 2019 88Â 4076mV 1000MHz
> Sat Sep 14 09:10:30 UTC 2019 88Â 4064mV 1000MHz
> Sat Sep 14 09:10:32 UTC 2019 88Â 4117mV 1000MHz
> Sat Sep 14 09:10:33 UTC 2019 88Â 4105mV 800MHz
> Sat Sep 14 09:10:34 UTC 2019 88Â 4070mV 1000MHz
> Sat Sep 14 09:10:36 UTC 2019 88Â 4076mV 1000MHz
> Sat Sep 14 09:10:37 UTC 2019 88Â 4087mV 1000MHz
> Sat Sep 14 09:10:39 UTC 2019 88Â 4017mV 1000MHz
> Sat Sep 14 09:10:40 UTC 2019 88Â 4093mV 1000MHz
> Sat Sep 14 09:10:41 UTC 2019 88Â 4058mV 800MHz
> Sat Sep 14 09:10:42 UTC 2019 88Â 4035mV 1000MHz
> Sat Sep 14 09:10:44 UTC 2019 90Â 4058mV 1000MHz
> Sat Sep 14 09:10:45 UTC 2019 88Â 4064mV 1000MHz
> Sat Sep 14 09:10:47 UTC 2019 88Â 4064mV 1000MHz
> Sat Sep 14 09:10:48 UTC 2019 88Â 4029mV 1000MHz
> Sat Sep 14 09:10:50 UTC 2019 90Â 4046mV 1000MHz
> ^Ckill 4680
> root@letux:~# cpufreq-info
> cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
> Report errors and bugs to cpufreq@xxxxxxxxxxxxxxx, please.
> analyzing CPU 0:
> driver: cpufreq-dt
> CPUs which run at the same hardware frequency: 0
> CPUs which need to have their frequency coordinated by software: 0
> maximum transition latency: 300 us.
> hardware limits: 300 MHz - 1000 MHz
> available frequency steps: 300 MHz, 600 MHz, 800 MHz, 1000 MHz
> available cpufreq governors: conservative, userspace, powersave, ondemand, performance
> current policy: frequency should be within 300 MHz and 1000 MHz.
> The governor "ondemand" may decide which speed to use
> within this range.
> current CPU frequency is 600 MHz (asserted by call to hardware).
> cpufreq stats: 300 MHz:22.81%, 600 MHz:2.50%, 800 MHz:2.10%, 1000 MHz:72.59% (1563)
> root@letux:~#
>
> So OPP is reduced if bandgap sensor reports >= 90ÂC
> which almost immediately makes the temperature
> go down.
>
> No operational hickups were observed.
>
> Surface temperature of the PoP chip did rise to
> approx. 53ÂC during this test.
>
> Tested-by: H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> # on GTA04A5 with dm3730cbp100
>