Re: [PATCH RESEND v1] thermal: core: fix blocking in unregistering zone
From: Rafael J. Wysocki
Date: Wed Apr 08 2026 - 11:05:54 EST
On Sun, Apr 5, 2026 at 5:34 AM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
>
> On 4/4/26 10:38, Rafael J. Wysocki wrote:
> > On Sat, Apr 4, 2026 at 4:02 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> >>
> >> On 4/4/26 05:58, Rafael J. Wysocki wrote:
> >>> On Fri, Apr 3, 2026 at 4:20 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> >>>>
> >>>> On 4/3/26 05:52, Rafael J. Wysocki wrote:
> >>>> .[ ... ]
> >>>>> It appears to work for me, but I'm not sure if having multiple hwmon class
> >>>>> devices with the same value in the name attribute is fine.
> >>>>
> >>>> Like this ?
> >>>>
> >>>> $ cd /sys/class/hwmon
> >>>> $ grep . */name
> >>>> hwmon0/name:r8169_0_c00:00
> >>>> hwmon1/name:nvme
> >>>> hwmon2/name:nvme
> >>>> hwmon3/name:nct6687
> >>>> hwmon4/name:k10temp
> >>>> hwmon5/name:spd5118
> >>>> hwmon6/name:spd5118
> >>>> hwmon7/name:spd5118
> >>>> hwmon8/name:spd5118
> >>>> hwmon9/name:mt7921_phy0
> >>>
> >>> Yes.
> >>>
> >>>> Names such as "r8169_0_c00:00" and "mt7921_phy0" are actually overkill
> >>>> since the "sensors" command makes it
> >>>>
> >>>> r8169_0_c00:00-mdio-0
> >>>> Adapter: MDIO adapter
> >>>> temp1: +36.0°C (high = +120.0°C)
> >>>>
> >>>> mt7921_phy0-pci-0d00
> >>>> Adapter: PCI adapter
> >>>> temp1: +30.0°C
> >>>>
> >>>> essentially duplicating the device index.
> >>>
> >>> Well, with the patch posted by me, the output of sensors from a test
> >>> system looks like this:
> >>>
> >>> acpitz-acpi-0
> >>> Adapter: ACPI interface
> >>> temp1: +16.8°C
> >>>
> >>> pch_cannonlake-virtual-0
> >>> Adapter: Virtual device
> >>> temp1: +33.0°C
> >>>
> >>> acpitz-acpi-0
> >>> Adapter: ACPI interface
> >>> temp1: +27.8°C
> >>>
> >>> (some further data excluded), which is kind of confusing (note the
> >>> duplicate acpitz-acpi-0 entries with different values of temp1).
> >>>
> >>
> >> Yes, agreed, that is confusing. I would have expected the second one
> >> to be identified as "acpitz-acpi-1". Do they both have the same parent ?
> >
> > No, they don't.
> >
> > The parent of each of them is a thermal zone device and both parents
> > have the same "type" value.
> >
> >>> That could be disambiguated by concatenating the thermal zone ID
> >>> (possibly after a '_') to the name. Or the "temp*" things for thermal
> >>> zones of the same type could carry different numbers.
> >>>
> >>> A less attractive alternative would be to register a special virtual
> >>> device serving as a parent for all hwmon interfaces registered
> >>> automatically for thermal zones.
> >>
> >> If they all have the same parent, technically it should be a single
> >> hwmon device with multiple sensors, as in:
> >>
> >> acpitz-acpi-0
> >> Adapter: ACPI interface
> >> temp1: +16.8°C
> >> temp2: +27.8°C
> >
> > So somebody tried to make it look like that by registering hwmon
> > interfaces for all of the thermal zones of the same type under one of
> > them, but that (quite obviously) doesn't work.
>
> Not sure I understand why that doesn't work or why that is obvious,
> but I'll take you by your word (I would agree that the current
> _implementation_ looks problematic).
For example, say that there are two ACPI thermal zones on a system
/sys/devices/virtual/thermal/thermal_zone0/
/sys/devices/virtual/thermal/thermal_zone1/
The current mainline code registers a hwmon class device for thermal_zone0 only:
/sys/devices/virtual/thermal/thermal_zone0/hwmon0/
because the type is "acpitz" for both of them, but it adds a sysfs
attribute that belongs to thermal_zone1 under it:
/sys/devices/virtual/thermal/thermal_zone0/hwmon0/temp2_input
There is also
/sys/devices/virtual/thermal/thermal_zone0/hwmon0/temp1_input
but it belongs to thermal_zone0.
Interesting things happen when thermal_zone0 is removed, for example
because the ACPI thermal driver is unbound from the underlying
platform device. Namely, the removal code skips the removal of hwmon0
because of the temp2_input attribute belonging to thermal_zone1 which
effectively prevents thermal_zone0 removal from making progress.
AFAICS, nothing particularly smart can be done to address this issue
while retaining the current design of the code. Reparenting hwmon0 to
thermal_zone1 may confuse user space as well as removing hwmon0 along
with temp2_input. That's why I think that this is a design issue.
> I looked into the source code of the "sensors" command. It indeed does
> not index ACPI devices (nor virtual devices, for that matter) but
> assumes that such devices are unique. My apologies for not realizing
> this earlier.
>
> So your only option is indeed to index the chip name if you want/need
> more than one hwmon device with the same base name (here: acpitz)
> instantiated from the thermal subsystem.
>
> One comment to one of your earlier e-mails:
>
> "However, it is more of a design issue IMV because putting temperature
> attributes for all of the (possibly unrelated) thermal zones of the
> same type under one hwmon interface is not particularly useful"
>
> A single hardware monitoring device, by design, serves multiple
> thermal zones. Anything else would not make sense for multi-channel
> hardware monitoring chips. The hardware monitoring subsystem groups
> sensors by chip, not by thermal zones.
>
> Having said this: This discussion is not new. Certain subsystems
> advocate for having one hardware monitoring device per sensor,
> not per chip. One submitter went as far as telling me that I am
> clueless. We don't need to repeat the exercise. I have my opinion,
> you have yours, and all we can do is to agree to disagree.
I'm not sure if this has anything to do with hardware monitoring chips
because hwmon_device_register_for_thermal() sets the chip argument of
__hwmon_device_register() to NULL, so the chip information is missing
in this particular case. The underlying hardware may or may not be a
multi-channel hardware monitoring chip, that is hard to tell in
general.
In the particular case of ACPI thermal zones, they each correspond to
a different platform device and regarding those as different channels
of the same hardware monitoring chip is kind of a stretch IMV (they
may even be located at different places in the device hierarchy).
Regardless, it should be possible to remove each of them cleanly
because they are handled by the driver independently.