Re: (bug report) HWMON & Thermal interactions

From: Cristian Marussi
Date: Mon Oct 24 2022 - 18:59:30 EST


On Mon, Oct 24, 2022 at 07:51:09AM -0700, Guenter Roeck wrote:
> On 10/24/22 05:47, Cristian Marussi wrote:
> > On Mon, Oct 24, 2022 at 04:56:43AM -0700, Guenter Roeck wrote:
> > > On 10/23/22 14:23, Guenter Roeck wrote:
> > > > On 10/23/22 11:27, Cristian Marussi wrote:
> > > > > Hi,
> > > > >
> > > > > Starting with v6.1-rc1 the SCMI HWMON driver failed probing on my JUNO due
> > > > > to the fact that no trip points were (ever !) defined in the DT; bisecting it
> > > > > looks like that after:
> > > > >
> > > > > https://lore.kernel.org/all/20220804224349.1926752-28-daniel.lezcano@xxxxxxxxxx/
> > > > >
> > > > > the presence of the mandatory trips node within thermal zones is now
> > > > > enforced.
> > > > >
> > > > > So, this is NOT what this bug report is about (I'll post soon patches for
> > > > > the JUNO DT missing trips) BUT once this problem was solved in the DT,
> > > > > another issue appeared:
> > > > >
> > > > > [    1.921929] hwmon hwmon0: temp2_input not attached to any thermal zone
> > > > >
> > > > > that despite having now a goodi/valid DT describing 2 sensors and 2 thermal zones
> > > > > embedding that sensors, only the first one is found as belonging to one ThermZ.
> > > > > (this happens ALSO with v6.0 once I added the trips...)
> > > > >
> > > > > Digging deep into this, it turned out that inside the call chain
> > > > >
> > > > > devm_hwmon_device_register_with_info
> > > > >    hwmon_device_register_with_info
> > > > >      __hwmon_device_register
> > > > >     hwmon_thermal_register_sensors(dev)
> > > > >         --> hwmon_thermal_add_sensor(dev, j)
> > > > >             --> devm_thermal_of_zone_register(dev, sensor_id, tdata, )
> > > > >
> > > > > the HWMON channel index j is passed to the Thermal framework in order to
> > > > > search and bind sensors with defined thermal zone, but this lead to the
> > > > > assumption that sequential HWMON channel indexes corresponds one-to-one to the
> > > > > underlying real sensor IDs that the ThermalFramework uses for matching
> > > > > within the DT.
> > > > >
> > > > > On a system like my SCMI-based DT where I have 2 temp-sensors bound to 2
> > > > > thermal zones like:
> > > > >
> > > > > thernal_zones {
> > > > >     pmic {
> > > > >         ...
> > > > >         thermal-sensors = <&scmi_sensors0 0>;
> > > > >         ...
> > > > >         trips {
> > > > >             ...
> > > > >         }
> > > > >     soc {
> > > > >         ...
> > > > >         thermal-sensors = <&scmi_sensors0 3>;
> > > > >         ...
> > > > >         trips {
> > > > >             ...
> > > > >         }
> > > > >     }
> > > > > }
> > > > >
> > > > > This works fine by chance for the pmic (j=0, sensor_id=0) BUT cannot work for
> > > > > the soc where J=1 BUT the real sensor ID is 3.
> > > > >
> > > > > Note that there can be a number of sensors, not all of them of a type handled
> > > > > by HWMON, and enumerated by SCMI in different ways depending on the
> > > > > platform.
> > > > >
> > > > > I suppose this is not an SCMI-only related issue, but maybe in non-SCMI
> > > > > context, where sensors are purely defined in the DT, the solution can be
> > > > > more easily attained (i.e. renumber the sensors).
> > > > >
> > > > > At first I tried to solve this inside scmi-hwmon.c BUT I could not find
> > > > > a way to present to the HWMON subsystem the list of sensors preserving
> > > > > the above index/sensor_id matching (not even with a hack like passing
> > > > > down dummy sensors to the HWMON subsystem to fill the 'holes' in the
> > > > > numbering)
> > > > >
> > > > > My tentative solution, which works fine for me in my context, was to add
> > > > > an optional HWMON hwops, so that the core hwmon can retrieve if needed the
> > > > > real sensor ID if different from the channel index (using an optional hwops
> > > > > instead of some static hwinfo var let me avoid to have to patch all the
> > > > > existent hwmon drivers that happens to just work fine as of today...but
> > > > > maybe it is not necessarily the proper final solution...)
> > > > >
> > > > > i.e.
> > > > >
> > > > > ----8<----
> > > > >
> > > > > Author: Cristian Marussi <cristian.marussi@xxxxxxx>
> > > > > Date:   Fri Oct 21 17:24:04 2022 +0100
> > > > >
> > > > >      hwmon: Add new .get_sensor_id hwops
> > > > >      Add a new optional helper which can be defined to allow an hwmon chip to
> > > > >      provide the logic to map hwmon indexes to the real underlying sensor IDs.
> > > >
> > > > Maybe I am missing something, but ...
> > > >
> > > > The driver isn't supposed to know anything about thermal devices and
> > > > thermal zones. If that no longer works, and drivers have to know about
> > > > thermal zones and thermal zone device index values anyway, we might
> > > > as well pull thermal device support from the hwmon core and implement
> > > > it in drivers.
> > > >
> > >
> > > No, wait: The question is really: Why does the scmi driver present the sensor
> > > with index 3 to the hwmon subsystem as sensor with index 1 ?
> > >
> > > If the sensor has index 3, and is presented to other entities as sensor
> > > with index 3, it should be presented to the hwmon subsystem as sensor with
> > > index 3, not with index 1. If sensors with index 1..2 do not exist,
> > > the is_visible function should return 0 for those sensors.
> > >
> >
> > My understanding was that the hwmon index is the index of the channel
> > and hwmon_channel_info struct groups channels by type while the index is
> > really used as a pointer in the hwmon_channel_info.config field, so in
> > this case you're saying I should present 4 temp sensors placing a 'hole'
> > at sensor 1,2 making is_visible return 0 for those channels ?
> >
> > Basically keeping the channel indexes in sync with the real sensor ID by
> > the means of some dummy sensor entries in the config field: this could result
> > potentially in a lot of holes given in SCMI the sensor_id is 16 bits and
> > I thought that was too hackish but I can try.
> >
>
> The underlying idea with the hwmon -> thermal bridge is that index values
> used by thermal and by the hwmon subsystem match and, yes, that there would
> if necessary be holes in hwmon index values (normally this is not a 16-bit
> number space). If that doesn't work for scmi, and if there could indeed be
> something like
>
> thermal-sensors = <&scmi_sensors0 12345>;
>
> then I think the solution is indeed to not rely on the hwmon->thermal bridge
> in the hwmon core for this driver.

Even though implausible it could be possible to have an SCMI fw platform
advertising such high sensor IDs.

>
> > In the meantime, I gave it a go at what you suggested early (if I got it
> > right...) by removing from the scmi-hwmon driver the HWMON_C_REGISTER_TZ
> > attribute and adding a few explicit calls to devm_thermal_of_zone_register() at
> > the end of the probe to specifically register the needed temp sensors (and
> > associated real sensor IDs) with the ThermalFramework without relying on the
> > HWMON core for Thermal and it works fine indeed.
> >
>
> Excellent.
>

I'll follow this path.

Thanks for your help & feedback.
Cristian