Re: (bug report) HWMON & Thermal interactions #forregzbot
From: Thorsten Leemhuis
Date: Mon Oct 24 2022 - 06:18:17 EST
[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]
[TLDR: I'm adding this regression report to the list of tracked
regressions; all text from me you find below is based on a few templates
paragraphs you might have encountered already already in similar form.]
Hi, this is your Linux kernel regression tracker. CCing the regression
mailing list, as it should be in the loop for all regressions, as
explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
On 23.10.22 20:27, Cristian Marussi wrote:
>
> Starting with v6.1-rc1 the SCMI HWMON driver failed probing on my JUNO due
> to the fact that no trip points were (ever !) defined in the DT; bisecting it
> looks like that after:
Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:
#regzbot ^introduced e51813313
#regzbot title SCMI HWMON driver failed probing on JUNO
#regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply -- ideally with also
telling regzbot about it, as explained here:
https://linux-regtracking.leemhuis.info/tracked-regression/
Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (the mail this one replies to), as explained for
in the Linux kernel's documentation; above webpage explains why this is
important for tracked regressions.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.
> https://lore.kernel.org/all/20220804224349.1926752-28-daniel.lezcano@xxxxxxxxxx/
>
> the presence of the mandatory trips node within thermal zones is now
> enforced.
>
> So, this is NOT what this bug report is about (I'll post soon patches for
> the JUNO DT missing trips) BUT once this problem was solved in the DT,
> another issue appeared:
>
> [ 1.921929] hwmon hwmon0: temp2_input not attached to any thermal zone
>
> that despite having now a goodi/valid DT describing 2 sensors and 2 thermal zones
> embedding that sensors, only the first one is found as belonging to one ThermZ.
> (this happens ALSO with v6.0 once I added the trips...)
>
> Digging deep into this, it turned out that inside the call chain
>
> devm_hwmon_device_register_with_info
> hwmon_device_register_with_info
> __hwmon_device_register
> hwmon_thermal_register_sensors(dev)
> --> hwmon_thermal_add_sensor(dev, j)
> --> devm_thermal_of_zone_register(dev, sensor_id, tdata, )
>
> the HWMON channel index j is passed to the Thermal framework in order to
> search and bind sensors with defined thermal zone, but this lead to the
> assumption that sequential HWMON channel indexes corresponds one-to-one to the
> underlying real sensor IDs that the ThermalFramework uses for matching
> within the DT.
>
> On a system like my SCMI-based DT where I have 2 temp-sensors bound to 2
> thermal zones like:
>
> thernal_zones {
> pmic {
> ...
> thermal-sensors = <&scmi_sensors0 0>;
> ...
> trips {
> ...
> }
> soc {
> ...
> thermal-sensors = <&scmi_sensors0 3>;
> ...
> trips {
> ...
> }
> }
> }
>
> This works fine by chance for the pmic (j=0, sensor_id=0) BUT cannot work for
> the soc where J=1 BUT the real sensor ID is 3.
>
> Note that there can be a number of sensors, not all of them of a type handled
> by HWMON, and enumerated by SCMI in different ways depending on the
> platform.
>
> I suppose this is not an SCMI-only related issue, but maybe in non-SCMI
> context, where sensors are purely defined in the DT, the solution can be
> more easily attained (i.e. renumber the sensors).
>
> At first I tried to solve this inside scmi-hwmon.c BUT I could not find
> a way to present to the HWMON subsystem the list of sensors preserving
> the above index/sensor_id matching (not even with a hack like passing
> down dummy sensors to the HWMON subsystem to fill the 'holes' in the
> numbering)
>
> My tentative solution, which works fine for me in my context, was to add
> an optional HWMON hwops, so that the core hwmon can retrieve if needed the
> real sensor ID if different from the channel index (using an optional hwops
> instead of some static hwinfo var let me avoid to have to patch all the
> existent hwmon drivers that happens to just work fine as of today...but
> maybe it is not necessarily the proper final solution...)
>
> i.e.
>
> ----8<----
>
> Author: Cristian Marussi <cristian.marussi@xxxxxxx>
> Date: Fri Oct 21 17:24:04 2022 +0100
>
> hwmon: Add new .get_sensor_id hwops
>
> Add a new optional helper which can be defined to allow an hwmon chip to
> provide the logic to map hwmon indexes to the real underlying sensor IDs.
>
> Signed-off-by: Cristian Marussi <cristian.marussi@xxxxxxx>
>
> diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c
> index 4218750d5a66..45d3d5070cde 100644
> --- a/drivers/hwmon/hwmon.c
> +++ b/drivers/hwmon/hwmon.c
> @@ -213,7 +213,8 @@ static void hwmon_thermal_remove_sensor(void *data)
> list_del(data);
> }
>
> -static int hwmon_thermal_add_sensor(struct device *dev, int index)
> +static int hwmon_thermal_add_sensor(struct device *dev, int index,
> 7+ unsigned int sensor_id)
> {
> struct hwmon_device *hwdev = to_hwmon_device(dev);
> struct hwmon_thermal_data *tdata;
> @@ -227,7 +228,7 @@ static int hwmon_thermal_add_sensor(struct device *dev, int index)
> tdata->dev = dev;
> tdata->index = index;
>
> - tzd = devm_thermal_of_zone_register(dev, index, tdata,
> + tzd = devm_thermal_of_zone_register(dev, sensor_id, tdata,
> &hwmon_thermal_ops);
> if (IS_ERR(tzd)) {
> if (PTR_ERR(tzd) != -ENODEV)
> @@ -264,13 +265,18 @@ static int hwmon_thermal_register_sensors(struct device *dev)
>
> for (j = 0; info[i]->config[j]; j++) {
> int err;
> + unsigned int id;
>
> if (!(info[i]->config[j] & HWMON_T_INPUT) ||
> !chip->ops->is_visible(drvdata, hwmon_temp,
> hwmon_temp_input, j))
> continue;
>
> - err = hwmon_thermal_add_sensor(dev, j);
> + id = !chip->ops->get_sensor_id ? j :
> + chip->ops->get_sensor_id(drvdata,
> + hwmon_temp, j);
> +
> + err = hwmon_thermal_add_sensor(dev, j, id);
> if (err)
> return err;
> }
> diff --git a/include/linux/hwmon.h b/include/linux/hwmon.h
> index 14325f93c6b2..e5dbab83f4d1 100644
> --- a/include/linux/hwmon.h
> +++ b/include/linux/hwmon.h
> @@ -396,6 +396,9 @@ enum hwmon_intrusion_attributes {
> struct hwmon_ops {
> umode_t (*is_visible)(const void *drvdata, enum hwmon_sensor_types type,
> u32 attr, int channel);
> + unsigned int (*get_sensor_id)(const void *drvdata,
> + enum hwmon_sensor_types type,
> + int channel);
> int (*read)(struct device *dev, enum hwmon_sensor_types type,
> u32 attr, int channel, long *val);
> int (*read_string)(struct device *dev, enum hwmon_sensor_types type,
>
>
> ----->8----
>
>
> ... plus obviously the related scmi-hwmon.c patch to make use of this.
>
> Any thought ? Am I missing something ?
> (not really an expert on both subsystems really ... :P)
>
> Thanks,
> Cristian
>