Re: [PATCH 2/5] thermal/core: Reset cooling state during cooling device unregistration

From: Rafael J. Wysocki
Date: Mon Mar 27 2023 - 11:13:32 EST


On Mon, Mar 27, 2023 at 4:50 PM Zhang, Rui <rui.zhang@xxxxxxxxx> wrote:
>
> On Fri, 2023-03-24 at 14:19 +0100, Rafael J. Wysocki wrote:
> > On Fri, Mar 24, 2023 at 8:08 AM Zhang Rui <rui.zhang@xxxxxxxxx>
> > wrote:
> > > When unregistering a cooling device, it is possible that the
> > > cooling
> > > device has been activated. And once the cooling device is
> > > unregistered,
> > > no one will deactivate it anymore.
> > >
> > > Reset cooling state during cooling device unregistration.
> > >
> > > Signed-off-by: Zhang Rui <rui.zhang@xxxxxxxxx>
> > > ---
> > > In theory, this problem that this patch fixes can be triggered on a
> > > platform with ACPI Active cooling, by
> > > 1. overheat the system to trigger ACPI active cooling
> > > 2. unload ACPI fan driver
> > > 3. check if the fan is still spinning
> > > But I don't have such a system so I didn't trigger then problem and
> > > I
> > > only did build & boot test.
> >
> > So I'm not sure if this change is actually safe.
> >
> > In the example above, the system will still need the fan to spin
> > after
> > the ACPI fan driver is unloaded in order to cool down, won't it?
>
> Then we can argue that the ACPI fan driver should not be unloaded in
> this case.

I don't think that whether or not the driver is expected to be
unloaded at a given time has any bearing on how it should behave when
actually unloaded.

Leaving the cooling device in its current state is "safe" from the
thermal control perspective, but it may affect the general user
experience (which may include performance too) going forward, so there
is a tradeoff.

You can argue that even if the cooling device is reset on the driver
removal, there should be another thermal control mechanism in place
that will take care of the overheat condition instead of it, but that
mechanism may be an emergency system shutdown.

What do the other cooling device drivers do in general when they get removed?

> Actually, this is the same situation as patch 1/5.
> Patch 1/5 fixes the problem that cooling state not restored to 0 when
> unloading the thermal driver, and this fixes the same problem when
> unloading the cooling device driver.

Right, it is analogous.