Re: [PATCH RESEND v1] thermal: core: fix blocking in unregistering zone

From: Rafael J. Wysocki

Date: Wed Apr 08 2026 - 11:57:30 EST


On Wed, Apr 8, 2026 at 5:32 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
>
> On 4/8/26 08:05, Rafael J. Wysocki wrote:
> > On Sun, Apr 5, 2026 at 5:34 AM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> >>
> >> On 4/4/26 10:38, Rafael J. Wysocki wrote:
> >>> On Sat, Apr 4, 2026 at 4:02 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> >>>>
> >>>> On 4/4/26 05:58, Rafael J. Wysocki wrote:
> >>>>> On Fri, Apr 3, 2026 at 4:20 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> >>>>>>
> >>>>>> On 4/3/26 05:52, Rafael J. Wysocki wrote:
> >>>>>> .[ ... ]
> >>>>>>> It appears to work for me, but I'm not sure if having multiple hwmon class
> >>>>>>> devices with the same value in the name attribute is fine.
> >>>>>>
> >>>>>> Like this ?
> >>>>>>
> >>>>>> $ cd /sys/class/hwmon
> >>>>>> $ grep . */name
> >>>>>> hwmon0/name:r8169_0_c00:00
> >>>>>> hwmon1/name:nvme
> >>>>>> hwmon2/name:nvme
> >>>>>> hwmon3/name:nct6687
> >>>>>> hwmon4/name:k10temp
> >>>>>> hwmon5/name:spd5118
> >>>>>> hwmon6/name:spd5118
> >>>>>> hwmon7/name:spd5118
> >>>>>> hwmon8/name:spd5118
> >>>>>> hwmon9/name:mt7921_phy0
> >>>>>
> >>>>> Yes.
> >>>>>
> >>>>>> Names such as "r8169_0_c00:00" and "mt7921_phy0" are actually overkill
> >>>>>> since the "sensors" command makes it
> >>>>>>
> >>>>>> r8169_0_c00:00-mdio-0
> >>>>>> Adapter: MDIO adapter
> >>>>>> temp1: +36.0°C (high = +120.0°C)
> >>>>>>
> >>>>>> mt7921_phy0-pci-0d00
> >>>>>> Adapter: PCI adapter
> >>>>>> temp1: +30.0°C
> >>>>>>
> >>>>>> essentially duplicating the device index.
> >>>>>
> >>>>> Well, with the patch posted by me, the output of sensors from a test
> >>>>> system looks like this:
> >>>>>
> >>>>> acpitz-acpi-0
> >>>>> Adapter: ACPI interface
> >>>>> temp1: +16.8°C
> >>>>>
> >>>>> pch_cannonlake-virtual-0
> >>>>> Adapter: Virtual device
> >>>>> temp1: +33.0°C
> >>>>>
> >>>>> acpitz-acpi-0
> >>>>> Adapter: ACPI interface
> >>>>> temp1: +27.8°C
> >>>>>
> >>>>> (some further data excluded), which is kind of confusing (note the
> >>>>> duplicate acpitz-acpi-0 entries with different values of temp1).
> >>>>>
> >>>>
> >>>> Yes, agreed, that is confusing. I would have expected the second one
> >>>> to be identified as "acpitz-acpi-1". Do they both have the same parent ?
> >>>
> >>> No, they don't.
> >>>
> >>> The parent of each of them is a thermal zone device and both parents
> >>> have the same "type" value.
> >>>
> >>>>> That could be disambiguated by concatenating the thermal zone ID
> >>>>> (possibly after a '_') to the name. Or the "temp*" things for thermal
> >>>>> zones of the same type could carry different numbers.
> >>>>>
> >>>>> A less attractive alternative would be to register a special virtual
> >>>>> device serving as a parent for all hwmon interfaces registered
> >>>>> automatically for thermal zones.
> >>>>
> >>>> If they all have the same parent, technically it should be a single
> >>>> hwmon device with multiple sensors, as in:
> >>>>
> >>>> acpitz-acpi-0
> >>>> Adapter: ACPI interface
> >>>> temp1: +16.8°C
> >>>> temp2: +27.8°C
> >>>
> >>> So somebody tried to make it look like that by registering hwmon
> >>> interfaces for all of the thermal zones of the same type under one of
> >>> them, but that (quite obviously) doesn't work.
> >>
> >> Not sure I understand why that doesn't work or why that is obvious,
> >> but I'll take you by your word (I would agree that the current
> >> _implementation_ looks problematic).
> >
> > For example, say that there are two ACPI thermal zones on a system
> >
> > /sys/devices/virtual/thermal/thermal_zone0/
> > /sys/devices/virtual/thermal/thermal_zone1/
> >
> > The current mainline code registers a hwmon class device for thermal_zone0 only:
> >
> > /sys/devices/virtual/thermal/thermal_zone0/hwmon0/
> >
> > because the type is "acpitz" for both of them, but it adds a sysfs
> > attribute that belongs to thermal_zone1 under it:
> >
> > /sys/devices/virtual/thermal/thermal_zone0/hwmon0/temp2_input
> >
> > There is also
> >
> > /sys/devices/virtual/thermal/thermal_zone0/hwmon0/temp1_input
> >
> > but it belongs to thermal_zone0.
> >
> > Interesting things happen when thermal_zone0 is removed, for example
> > because the ACPI thermal driver is unbound from the underlying
> > platform device. Namely, the removal code skips the removal of hwmon0
> > because of the temp2_input attribute belonging to thermal_zone1 which
> > effectively prevents thermal_zone0 removal from making progress.
> >
> > AFAICS, nothing particularly smart can be done to address this issue
> > while retaining the current design of the code. Reparenting hwmon0 to
> > thermal_zone1 may confuse user space as well as removing hwmon0 along
> > with temp2_input. That's why I think that this is a design issue.
> >
>
> The ACPI power meter driver has pretty much the same problem. A clear
> solution would require making hwmon sysfs attributes dynamic in nature
> (i.e., by adding the ability to change the visibility of attributes in
> runtime). I have started working on that, but did not have time to
> complete the work. The ACPI power meter driver uses a kludge around that:
> It unregisters the hwmon device whenever it gets a METER_NOTIFY_CONFIG
> event and re-registers it.
>
> Anyway, registering separate hwmon devices, one per thermal zone,
> is perfectly fine with me.

OK, I'll do that and see how it goes.

Thanks for the feedback!