Re: [RFC] ARM: dts: omap36xx: Enable thermal throttling

From: Adam Ford
Date: Fri Sep 13 2019 - 17:01:15 EST


On Fri, Sep 13, 2019 at 3:35 PM H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> wrote:
>
> Hi Daniel,
>
> > Am 13.09.2019 um 22:11 schrieb Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>:
> >
> > On 13/09/2019 20:46, Adam Ford wrote:
> >> On Fri, Sep 13, 2019 at 12:18 PM Daniel Lezcano
> >> <daniel.lezcano@xxxxxxxxxx> wrote:
> >>>
> >>> On 13/09/2019 18:51, H. Nikolaus Schaller wrote:
> >>>
> >>> [ ... ]
> >>>
> >>>>> Good news (I think)
> >>>>>
> >>>>> With cooling-device = <&cpu 1 2> setup, I was able to ask the max
> >>>>> frequency and it returned 600MHz.
> >>>>>
> >>>>> # cat /sys/devices/virtual/thermal/thermal_zone0/temp
> >>>>> 58500
> >>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
> >>>>> 300000 600000 800000
> >>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_m
> >>>>> scaling_max_freq scaling_min_freq
> >>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
> >>>>> 600000
> >>>>
> >>>> looks good!
> >>>> But we have to understand what the <&cpu 1 2> exactly means...
> >>>>
> >>>> Hopefully someone reading your RFCv2 can answer...
> >>>
> >> Daniel,
> >>
> >> Thank you for replying.
> >>
> >>> I may have missed the question :)
> >>>
> >>> These are the states allowed for the cooling device (the one you can see
> >>> in the /sys/class/thermal/cooling_device0/max_state. As the logic is
> >>> inverted for cpufreq, that can be confusing.
> >>
> >> I think that's what has be confused.
> >>
> >>>
> >>> If it was a fan with, let's say 5 speeds, you would use <&fan 0 5>, so
> >>> when the mitigation begins the cooling device state is 0 and then the
> >>> thermal governor increase the state until it sees a cooling effect.
> >>>
> >>> If <&fan 0 2> is set, the governor won't set a state above 2 even if the
> >>> temperature increases.
> >>
> >> I am not sure I know what you mean by 'state' in this context.
> >
> > A thermal zone is managed by the thermal framework as the following:
> > - a sensor
> > - a governor
> > - a cooling device
> >
> > The governor gets the temperature via the sensor and depending on the
> > temperature it will increase or decrease the cooling effect of the
> > cooling device. With a fan, that means it will increase or decrease its
> > speed. With cpufreq, it will decrease or increase the OPP.
> >
> > These are discrete values the governor will use to set the cooling
> > effect. The state is one of these value (the current speed or the
> > current OPP index).
> >
> > Depending on the cooling device, the number of states are different.
> >
> > In the context above, the fan cooling device can be stopped (state=0),
> > running (state=1), running faster (state=2).
> >
> > As the node tells to use no more than 2, then the governor will never go
> > to running much faster (state=3). (That's an example).
> >
> >>> When the cooling driver is able to return the number of states it
> >>> supports, it is safe to set the states to THERMAL_NO_LIMIT and let the
> >>> governor to find the balance point.
> >>
> >> If the cooling driver is using cpufreq, is the number of supported
> >> states equal to the number of operating points given to cpufreq?
> >
> > Yes, absolutely if THERMAL_NO_LIMIT is set [1] (which is what is done
> > most of the cases). Otherwise it will use the boundaries set in <&cpu
> > limit_low limit_high>
> >
> > When changing the limits, a state=1 has a different meaning.
> >
> > For example: 7 OPPs available
> >
> > <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> : state=[0..7]
> >
> > <&cpu 0 2> : state=[0..2] (1, 2)
> >
> > <&cpu 5 7> : state=[0..3] (5, 6, 7)
> >
> > [1]
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/thermal/cpu_cooling.c#n334
> >
> >>> Now if the cooling device is cpufreq, the state order is inverted,
> >>> because the cooling effects happens when decreasing the OPP.
> >>>
> >>> If the boards support 7 OPPs, the state 0 is 7 - 0, so no mitigation, if
> >>> the state is 1, the cpufreq is throttle to the 6th OPP, 2 to the 5th OPP
> >>> etc.
> >>
> >> I am not sure how the state would be set to 2.
> >
> > That is a governor decision. Let me give an example with a hikey960
> > board which has very fast temperature transitions, so it is simpler to
> > illustrate the behavior. The trip point is 75ÂC.
> >
> > Imagine the CPU gets loaded 100%, the cpufreq sets the OPP to the max
> > (2.36GHz), as the temperature is still under 75ÂC, there is no
> > mitigation yet, so the cooling device state is 0.
> >
> > In a very few seconds the temperature reaches 75ÂC, that trigger the
> > monitoring of the thermal zone and the mitigation begins, then the
> > temperature continues to increase very quickly to 78ÂC, the governor see
> > we are above the trip point and increment the cooling device state
> > (state=>1). That leads to an OPP change from 2.36GHz to 2.11GHz.
> >
> > The governor continues to read the temperature and see the temperature
> > is still increasing (even if it is that happens more slowly), so it
> > increases the state again (state=>2). That leads to an OPP change from
> > 2.11GHz to 1.8GHz.
> >
> > The governor continues to read the temperature and see the temperature
> > decrease, it does nothing.
>
> Ah, I think our misunderstanding is that the govenor "enables" and
> "disables" a set of OPPs. Rather it goes down or up in the list if
> above or below a trip point.
>
> >
> > The governor continues to read the temperature, see the temperature
> > decreases and is below 75ÂC, it decrease the state (state=>1), the OPP
> > change to 2.36GHz.
> >
> > The temperature then increases, etc ...
> >
> > Actually the governors do more than that but it is for the example.
> >
> > So it is a bad idea to set boundaries for the cooling device state as
> > that may prevent the governor to take the right decision for the cooling
> > effect. Imagine in the example above, we set the max state to 1 for the
> > cooling device, that would mean the governor won't be able to stop the
> > temperature increasing, thus ending up to a hard reboot.
>
> Well, the data sheet only requires that the high speed OPPs are only
> used below 90ÂC. If I understand correctly if we set the trip point to
> 90ÂC it will simply go down through the full list of OPPs. This will
> clearly avoid the high speed OPPs (and potentially some low-speed
> ones, but that does not harm).
>
> So our approach "how to make it disable these two OPPs" seems to be
> wrong. Rather, we have to think "make sure the temperature
> stays below 90ÂC".
>
> And is it true that we do not have to define mapping for the "critical"
> trip points?
>
> >
> >>> Now the different combinations:
> >>>
> >>> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> the governor will use the state
> >>> 0 to 7.
> >>>
> >>> <&cpu THERMAL_NO_LIMIT 2> the governor will use the state 0 to 2
> >>
> >> What would be the difference between <&cpu THERMAL_NO_LIMIT 2> and
> >> <&cpu 0 2> ?
> >> (if there is any)
> >
> > There is no difference.
> >
> >
> >>> <&cpu 1 2> the governor will use the state 1 and 2. That means there is
> >>> always the cooling effect as the governor won't set it to zero thus
> >>> stopping the mitigation.
> >>
> >> For the purposes of the board in question, we have 4 operating points,
> >> 300MHz, 600MHz, 800MHz and 1GHz. Once the board reaches 90C, we need
> >> them to cease operation at 800MHz and 1GHz and only permit operation
> >> at 300MHz and 600MHz. I am going under the assumption that the cpu
> >> index[0] would be for 300MHz, index[1] = 600MHz, etc.
> >>
> >> If I am interpreting your comment correctly, I should set <&cpu
> >> THERMAL_NO_LIMIT 2> which would allow it to either not cool and run up
> >> to 600MHz and not exceed, is that correct?
> >
> > Nope, it will mean the cooling device can only reduce to 800MHz and to
> > 600MHz to mitigate.
> >
> > Actually the thermal framework neither the kernel are designed to handle
> > this case. They assume the OPPs are stable whatever the thermal situation.
> >
> > That is the reason why I think it is a very interesting use case because
> > it introduces a temperature constraint in addition to a duration for a
> > certain OPP. IMO, that could be an extension of the turbo-mode.
> >
> > With what we have now, I doubt it is feasible.
> >
> > The best we can do is preventing to reach the 90ÂC, so we remove the OPP
> > temperature constraint. I suppose 85ÂC is a safe temperature to stick on.
> >
> > And in order to let the governor have free hand.
> >
> > <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT>
> >
> > I don't think that will have a significant impact on performances
> > compared to be able to run at a higher temperature with less OPPs.

Thank you for the explanation. I think I'll ask Tony to drop this RFC
since we have what you're proposing already in a separate series. I
appreciate your explanations.

adam
> >
> >
> >>> Does it clarify the DT spec?
> >>>
> >>
> >> I think your reply to my inquiry might. If possible, it would be nice
> >> to get this documented into the bindings doc for others in the future.
> >> I can do it, but someone with a better understanding of the concept
> >> maybe more qualified. I can totally understand why some may want to
> >> integrate this into their SoC device trees to slow the processor when
> >> hot.
> >>
> >> Thank you for taking the time to review this. I appreciate it.
> >>
> >> adam
>
> BR,
> Nikolaus
>