Re: [PATCH v3 3/3] dt-bindings: thermal: Add yaml bindings for thermal zones

From: Amit Kucheria
Date: Wed Mar 25 2020 - 11:42:27 EST


On Wed, Mar 25, 2020 at 4:36 PM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
>
>
>
> On 3/25/20 6:34 AM, Amit Kucheria wrote:
> > As part of moving the thermal bindings to YAML, split it up into 3
> > bindings: thermal sensors, cooling devices and thermal zones.
> >
> > The thermal-zone binding is a software abstraction to capture the
> > properties of each zone - how often they should be checked, the
> > temperature thresholds (trips) at which mitigation actions need to be
> > taken and the level of mitigation needed at those thresholds.
> >
> > Signed-off-by: Amit Kucheria <amit.kucheria@xxxxxxxxxx>
> > ---
> > Changes since v2:
> > - Addressed review comment from Rob
> > - Added required properties for thermal-zones node
> > - Added select: true to thermal-cooling-devices.yaml
> > - Fixed up example to pass dt_binding_check
> >
> > .../bindings/thermal/thermal-zones.yaml | 324 ++++++++++++++++++
> > 1 file changed, 324 insertions(+)
> > create mode 100644 Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> > new file mode 100644
> > index 000000000000..5632304dcf62
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> > @@ -0,0 +1,324 @@
> > +# SPDX-License-Identifier: (GPL-2.0)
> > +# Copyright 2020 Linaro Ltd.
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/thermal/thermal-zones.yaml#
> > +$schema: http://devicetree.org/meta-schemas/base.yaml#
> > +
> > +title: Thermal zone binding
> > +
> > +maintainers:
> > + - Amit Kucheria <amitk@xxxxxxxxxx>
> > +
> > +description: |
> > + Thermal management is achieved in devicetree by describing the sensor hardware
> > + and the software abstraction of cooling devices and thermal zones required to
> > + take appropriate action to mitigate thermal overloads.
> > +
> > + The following node types are used to completely describe a thermal management
> > + system in devicetree:
> > + - thermal-sensor: device that measures temperature, has SoC-specific bindings
> > + - cooling-device: device used to dissipate heat either passively or actively
> > + - thermal-zones: a container of the following node types used to describe all
> > + thermal data for the platform
> > +
> > + This binding describes the thermal-zones.
> > +
> > + The polling-delay properties of a thermal-zone are bound to the maximum dT/dt
> > + (temperature derivative over time) in two situations for a thermal zone:
> > + 1. when passive cooling is activated (polling-delay-passive)
> > + 2. when the zone just needs to be monitored (polling-delay) or when
> > + active cooling is activated.
> > +
> > + The maximum dT/dt is highly bound to hardware power consumption and
> > + dissipation capability. The delays should be chosen to account for said
> > + max dT/dt, such that a device does not cross several trip boundaries
> > + unexpectedly between polls. Choosing the right polling delays shall avoid
> > + having the device in temperature ranges that may damage the silicon structures
> > + and reduce silicon lifetime.
> > +
> > +properties:
> > + $nodename:
> > + const: thermal-zones
> > + description:
> > + A /thermal-zones node is required in order to use the thermal framework to
> > + manage input from the various thermal zones in the system in order to
> > + mitigate thermal overload conditions. It does not represent a real device
> > + in the system, but acts as a container to link thermal sensor devices,
>
> I would say 'thermal sensor device', since there is 1-to-1 mapping and
> aggregating a few sensors inside one tz is not allowed (or I missed
> some patches queuing).

See below.

>
> > + platform-data regarding temperature thresholds and the mitigation actions
> > + to take when the temperature crosses those thresholds.
> > +
> > +patternProperties:
> > + "^[a-zA-Z][a-zA-Z0-9\\-]{1,12}-thermal$":
> > + type: object
> > + description:
> > + Each thermal zone node contains information about how frequently it
> > + must be checked, the sensor responsible for reporting temperature for
> > + this zone, one sub-node containing the various trip points for this
> > + zone and one sub-node containing all the zone cooling-maps.
> > +
> > + properties:
> > + polling-delay:
> > + $ref: /schemas/types.yaml#/definitions/uint32
> > + description:
> > + The maximum number of milliseconds to wait between polls when
> > + checking this thermal zone. Setting this to 0 disables the polling
> > + timers setup by the thermal framework and assumes that the thermal
> > + sensors in this zone support interrupts.
> > +
> > + polling-delay-passive:
> > + $ref: /schemas/types.yaml#/definitions/uint32
> > + description:
> > + The maximum number of milliseconds to wait between polls when
> > + checking this thermal zone while doing passive cooling. Setting
> > + this to 0 disables the polling timers setup by the thermal
> > + framework and assumes that the thermal sensors in this zone
> > + support interrupts.
> > +
> > + thermal-sensors:
> > + $ref: /schemas/types.yaml#/definitions/phandle-array
> > + description:
> > + A list of thermal sensor phandles and sensor specifiers used to
> > + monitor this thermal zone.
>
> I don't know why it's not consistent with the actual code in
> of-thermal.c, where there is even a comment stated:
> /* For now, thermal framework supports only 1 sensor per zone */
>
> I think this is the place where developers should be informed about
> the limitation and not even try to put more sensors into the list.

That is a good point. I'm currently "porting" the existing binding as
described in thermal.txt to yaml. If you look at some of the example
(c) in there, the bindings allow many sensors to a zone mapping but
the thermal core doesn't implement that functionality.

So should we fix the core code or change the bindings? Thoughts - Rob,
Daniel, Rui?

> > +
> > + trips:
> > + type: object
> > + description:
> > + This node describes a set of points in the temperature domain at
> > + which the thermal framework needs to takes action. The actions to
>
> s/needs to takes/needs to take/

Will fix.

> > + be taken are defined in another node called cooling-maps.
> > +
> > + patternProperties:
> > + "^[a-zA-Z][a-zA-Z0-9\\-_]{0,63}$":
> > + type: object
> > +
> > + properties:
> > + temperature:
> > + $ref: /schemas/types.yaml#/definitions/int32
> > + minimum: -273000
> > + maximum: 200000
> > + description:
> > + An integer expressing the trip temperature in millicelsius.
> > +
> > + hysteresis:
> > + $ref: /schemas/types.yaml#/definitions/uint32
> > + description:
> > + An unsigned integer expressing the hysteresis delta with
> > + respect to the trip temperature property above, also in
> > + millicelsius.
>
> This property is worth a bit longer description.

Will improve the description.

> > +
> > + type:
> > + $ref: /schemas/types.yaml#/definitions/string
> > + enum:
> > + - active # enable active cooling e.g. fans
> > + - passive # enable passive cooling e.g. throttling cpu
> > + - hot # send notification to driver
> > + - critical # send notification to driver, trigger shutdown
> > + description: |
> > + There are four valid trip types: active, passive, hot,
> > + critical.
>
> [snip]
>
> > +
> > + thermal-zones {
> > + cpu0-thermal {
> > + polling-delay-passive = <250>;
> > + polling-delay = <1000>;
> > +
> > + thermal-sensors = <&tsens0 1>;
> > +
> > + trips {
> > + cpu0_alert0: trip-point0 {
> > + temperature = <90000>;
> > + hysteresis = <2000>;
> > + type = "passive";
> > + };
> > +
> > + cpu0_alert1: trip-point1 {
> > + temperature = <95000>;
> > + hysteresis = <2000>;
> > + type = "passive";
> > + };
> > +
> > + cpu0_crit: cpu_crit {
> > + temperature = <110000>;
> > + hysteresis = <1000>;
> > + type = "critical";
> > + };
> > + };
> > +
> > + cooling-maps {
> > + map0 {
> > + trip = <&cpu0_alert0>;
> > + cooling-device = <&CPU0 THERMAL_NO_LIMIT
> > + THERMAL_NO_LIMIT>,
> > + <&CPU1 THERMAL_NO_LIMIT
> > + THERMAL_NO_LIMIT>,
> > + <&CPU2 THERMAL_NO_LIMIT
> > + THERMAL_NO_LIMIT>,
> > + <&CPU3 THERMAL_NO_LIMIT
> > + THERMAL_NO_LIMIT>;
> > + };
> > +
> > + map1 {
> > + trip = <&cpu0_alert1>;
> > + cooling-device = <&CPU0 THERMAL_NO_LIMIT
> > + THERMAL_NO_LIMIT>,
> > + <&CPU1 THERMAL_NO_LIMIT
> > + THERMAL_NO_LIMIT>,
> > + <&CPU2 THERMAL_NO_LIMIT
> > + THERMAL_NO_LIMIT>,
> > + <&CPU3 THERMAL_NO_LIMIT
> > + THERMAL_NO_LIMIT>;
>
> From this two examples of handling cpu0_alert0 and cpu0_alert1 you
> cannot conclude anything (if you don't understand thermal framework (and
> probably IPA). As a simple example it would be better to put a comment
> with a description and limit min, max to a specific OPP:
>
> map0 {
> trip = <&cpu0_alert0>;
> /* Corresponds to 1400MHz in OPP table */
> cooling-device = <&CPU0 3 3>, <&CPU1 3 3>, <&CPU2 3 3>, <&CPU3 3 3>;
> };
>
> map1 {
> trip = <&cpu0_alert1>;
> /* Corresponds to 1000MHz in OPP table */
> cooling-device = <&CPU0 5 5>, <&CPU1 5 5>, <&CPU2 5 5>, <&CPU3 5 5>;
> };
>
> IMHO this kind of example would tell more to an avg driver developer.

Will fix.

Thanks for the review.

Regards,
Amit