Re: [PATCH v3 3/3] dt-bindings: thermal: Add yaml bindings for thermal zones

From: Amit Kucheria
Date: Mon Mar 30 2020 - 06:34:39 EST


On Wed, Mar 25, 2020 at 9:12 PM Amit Kucheria <amit.kucheria@xxxxxxxxxx> wrote:
>
> On Wed, Mar 25, 2020 at 4:36 PM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
> >
> >
> >
> > On 3/25/20 6:34 AM, Amit Kucheria wrote:
> > > As part of moving the thermal bindings to YAML, split it up into 3
> > > bindings: thermal sensors, cooling devices and thermal zones.
> > >
> > > The thermal-zone binding is a software abstraction to capture the
> > > properties of each zone - how often they should be checked, the
> > > temperature thresholds (trips) at which mitigation actions need to be
> > > taken and the level of mitigation needed at those thresholds.
> > >
> > > Signed-off-by: Amit Kucheria <amit.kucheria@xxxxxxxxxx>
> > > ---
> > > Changes since v2:
> > > - Addressed review comment from Rob
> > > - Added required properties for thermal-zones node
> > > - Added select: true to thermal-cooling-devices.yaml
> > > - Fixed up example to pass dt_binding_check
> > >
> > > .../bindings/thermal/thermal-zones.yaml | 324 ++++++++++++++++++
> > > 1 file changed, 324 insertions(+)
> > > create mode 100644 Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> > >
> > > diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> > > new file mode 100644
> > > index 000000000000..5632304dcf62
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> > > @@ -0,0 +1,324 @@
> > > +# SPDX-License-Identifier: (GPL-2.0)
> > > +# Copyright 2020 Linaro Ltd.
> > > +%YAML 1.2
> > > +---
> > > +$id: http://devicetree.org/schemas/thermal/thermal-zones.yaml#
> > > +$schema: http://devicetree.org/meta-schemas/base.yaml#
> > > +
> > > +title: Thermal zone binding
> > > +
> > > +maintainers:
> > > + - Amit Kucheria <amitk@xxxxxxxxxx>
> > > +
> > > +description: |
> > > + Thermal management is achieved in devicetree by describing the sensor hardware
> > > + and the software abstraction of cooling devices and thermal zones required to
> > > + take appropriate action to mitigate thermal overloads.
> > > +
> > > + The following node types are used to completely describe a thermal management
> > > + system in devicetree:
> > > + - thermal-sensor: device that measures temperature, has SoC-specific bindings
> > > + - cooling-device: device used to dissipate heat either passively or actively
> > > + - thermal-zones: a container of the following node types used to describe all
> > > + thermal data for the platform
> > > +
> > > + This binding describes the thermal-zones.
> > > +
> > > + The polling-delay properties of a thermal-zone are bound to the maximum dT/dt
> > > + (temperature derivative over time) in two situations for a thermal zone:
> > > + 1. when passive cooling is activated (polling-delay-passive)
> > > + 2. when the zone just needs to be monitored (polling-delay) or when
> > > + active cooling is activated.
> > > +
> > > + The maximum dT/dt is highly bound to hardware power consumption and
> > > + dissipation capability. The delays should be chosen to account for said
> > > + max dT/dt, such that a device does not cross several trip boundaries
> > > + unexpectedly between polls. Choosing the right polling delays shall avoid
> > > + having the device in temperature ranges that may damage the silicon structures
> > > + and reduce silicon lifetime.
> > > +
> > > +properties:
> > > + $nodename:
> > > + const: thermal-zones
> > > + description:
> > > + A /thermal-zones node is required in order to use the thermal framework to
> > > + manage input from the various thermal zones in the system in order to
> > > + mitigate thermal overload conditions. It does not represent a real device
> > > + in the system, but acts as a container to link thermal sensor devices,
> >
> > I would say 'thermal sensor device', since there is 1-to-1 mapping and
> > aggregating a few sensors inside one tz is not allowed (or I missed
> > some patches queuing).
>
> See below.
>
> >
> > > + platform-data regarding temperature thresholds and the mitigation actions
> > > + to take when the temperature crosses those thresholds.
> > > +
> > > +patternProperties:
> > > + "^[a-zA-Z][a-zA-Z0-9\\-]{1,12}-thermal$":
> > > + type: object
> > > + description:
> > > + Each thermal zone node contains information about how frequently it
> > > + must be checked, the sensor responsible for reporting temperature for
> > > + this zone, one sub-node containing the various trip points for this
> > > + zone and one sub-node containing all the zone cooling-maps.
> > > +
> > > + properties:
> > > + polling-delay:
> > > + $ref: /schemas/types.yaml#/definitions/uint32
> > > + description:
> > > + The maximum number of milliseconds to wait between polls when
> > > + checking this thermal zone. Setting this to 0 disables the polling
> > > + timers setup by the thermal framework and assumes that the thermal
> > > + sensors in this zone support interrupts.
> > > +
> > > + polling-delay-passive:
> > > + $ref: /schemas/types.yaml#/definitions/uint32
> > > + description:
> > > + The maximum number of milliseconds to wait between polls when
> > > + checking this thermal zone while doing passive cooling. Setting
> > > + this to 0 disables the polling timers setup by the thermal
> > > + framework and assumes that the thermal sensors in this zone
> > > + support interrupts.
> > > +
> > > + thermal-sensors:
> > > + $ref: /schemas/types.yaml#/definitions/phandle-array
> > > + description:
> > > + A list of thermal sensor phandles and sensor specifiers used to
> > > + monitor this thermal zone.
> >
> > I don't know why it's not consistent with the actual code in
> > of-thermal.c, where there is even a comment stated:
> > /* For now, thermal framework supports only 1 sensor per zone */
> >
> > I think this is the place where developers should be informed about
> > the limitation and not even try to put more sensors into the list.
>
> That is a good point. I'm currently "porting" the existing binding as
> described in thermal.txt to yaml. If you look at some of the example
> (c) in there, the bindings allow many sensors to a zone mapping but
> the thermal core doesn't implement that functionality.
>
> So should we fix the core code or change the bindings? Thoughts - Rob,
> Daniel, Rui?

Rob, Daniel: Any comments? We don't have any concerns for Linux
backward compatibility since multiple sensors per zone isn't used
anywhere. But asking since bindings are supposed to be OS-agnostic.

> > > +
> > > + trips:
> > > + type: object
> > > + description:
> > > + This node describes a set of points in the temperature domain at
> > > + which the thermal framework needs to takes action. The actions to
> >
> > s/needs to takes/needs to take/
>
> Will fix.
>
> > > + be taken are defined in another node called cooling-maps.
> > > +
> > > + patternProperties:
> > > + "^[a-zA-Z][a-zA-Z0-9\\-_]{0,63}$":
> > > + type: object
> > > +
> > > + properties:
> > > + temperature:
> > > + $ref: /schemas/types.yaml#/definitions/int32
> > > + minimum: -273000
> > > + maximum: 200000
> > > + description:
> > > + An integer expressing the trip temperature in millicelsius.
> > > +
> > > + hysteresis:
> > > + $ref: /schemas/types.yaml#/definitions/uint32
> > > + description:
> > > + An unsigned integer expressing the hysteresis delta with
> > > + respect to the trip temperature property above, also in
> > > + millicelsius.
> >
> > This property is worth a bit longer description.
>
> Will improve the description.
>
> > > +
> > > + type:
> > > + $ref: /schemas/types.yaml#/definitions/string
> > > + enum:
> > > + - active # enable active cooling e.g. fans
> > > + - passive # enable passive cooling e.g. throttling cpu
> > > + - hot # send notification to driver
> > > + - critical # send notification to driver, trigger shutdown
> > > + description: |
> > > + There are four valid trip types: active, passive, hot,
> > > + critical.
> >
> > [snip]
> >
> > > +
> > > + thermal-zones {
> > > + cpu0-thermal {
> > > + polling-delay-passive = <250>;
> > > + polling-delay = <1000>;
> > > +
> > > + thermal-sensors = <&tsens0 1>;
> > > +
> > > + trips {
> > > + cpu0_alert0: trip-point0 {
> > > + temperature = <90000>;
> > > + hysteresis = <2000>;
> > > + type = "passive";
> > > + };
> > > +
> > > + cpu0_alert1: trip-point1 {
> > > + temperature = <95000>;
> > > + hysteresis = <2000>;
> > > + type = "passive";
> > > + };
> > > +
> > > + cpu0_crit: cpu_crit {
> > > + temperature = <110000>;
> > > + hysteresis = <1000>;
> > > + type = "critical";
> > > + };
> > > + };
> > > +
> > > + cooling-maps {
> > > + map0 {
> > > + trip = <&cpu0_alert0>;
> > > + cooling-device = <&CPU0 THERMAL_NO_LIMIT
> > > + THERMAL_NO_LIMIT>,
> > > + <&CPU1 THERMAL_NO_LIMIT
> > > + THERMAL_NO_LIMIT>,
> > > + <&CPU2 THERMAL_NO_LIMIT
> > > + THERMAL_NO_LIMIT>,
> > > + <&CPU3 THERMAL_NO_LIMIT
> > > + THERMAL_NO_LIMIT>;
> > > + };
> > > +
> > > + map1 {
> > > + trip = <&cpu0_alert1>;
> > > + cooling-device = <&CPU0 THERMAL_NO_LIMIT
> > > + THERMAL_NO_LIMIT>,
> > > + <&CPU1 THERMAL_NO_LIMIT
> > > + THERMAL_NO_LIMIT>,
> > > + <&CPU2 THERMAL_NO_LIMIT
> > > + THERMAL_NO_LIMIT>,
> > > + <&CPU3 THERMAL_NO_LIMIT
> > > + THERMAL_NO_LIMIT>;
> >
> > From this two examples of handling cpu0_alert0 and cpu0_alert1 you
> > cannot conclude anything (if you don't understand thermal framework (and
> > probably IPA). As a simple example it would be better to put a comment
> > with a description and limit min, max to a specific OPP:
> >
> > map0 {
> > trip = <&cpu0_alert0>;
> > /* Corresponds to 1400MHz in OPP table */
> > cooling-device = <&CPU0 3 3>, <&CPU1 3 3>, <&CPU2 3 3>, <&CPU3 3 3>;
> > };
> >
> > map1 {
> > trip = <&cpu0_alert1>;
> > /* Corresponds to 1000MHz in OPP table */
> > cooling-device = <&CPU0 5 5>, <&CPU1 5 5>, <&CPU2 5 5>, <&CPU3 5 5>;
> > };
> >
> > IMHO this kind of example would tell more to an avg driver developer.
>
> Will fix.
>
> Thanks for the review.
>
> Regards,
> Amit