Re: [PATCH v1] thermal: core: Address thermal zone removal races with resume
From: Rafael J. Wysocki
Date: Thu Mar 26 2026 - 15:13:46 EST
On Thu, Mar 26, 2026 at 7:35 PM Mauricio Faria de Oliveira
<mfo@xxxxxxxxxx> wrote:
>
> On 2026-03-26 08:45, Rafael J. Wysocki wrote:
> > Address the first failing scenario by ensuring that no thermal work
> > items will be running when thermal_pm_notify_complete() is called.
> > For this purpose, first move the cancel_delayed_work() call from
> > thermal_zone_pm_complete() to thermal_zone_pm_prepare() to prevent
> > new work from entering the workqueue going forward. Next, switch
> > over to using a dedicated workqueue for thermal events and update
> > the code in thermal_pm_notify() to flush that workqueue after
> > thermal_pm_notify_prepare() has returned which will take care of
> > all leftover thermal work already on the workqueue (that leftover
> > work would do nothing useful anyway because all of the thermal zones
> > have been flagged as suspended).
>
> Thanks for coming up with this alternative. I spent some time earlier
> today thinking of corner cases in that it might fail, and it held OK.
>
> However, slightly unrelated: apparently, flushing the workqueue in
> thermal_pm_notify() reintroduces the issue addressed by the Fixes:
> commit, but moving it from PM_POST_* to PM_*_PREPARE?
Note that the work in question will be thermal_zone_device_check(),
which simply calls thermal_zone_device_update() that essentially
invokes __thermal_zone_device_update() under tz->lock.
Thus thermal_zone_device_update() can only run as a whole before or
after thermal_zone_pm_complete() for the given zone because the latter
also acquires tz->lock and releases it at the end. If it runs before
the latter, it will be waited for because the latter will block on the
lock, but that happens without the changes in the $subject patch. If
it runs after the latter, __thermal_zone_device_update() will see that
tz->state is not TZ_STATE_READY (because TZ_STATE_FLAG_SUSPENDED is
set) and it will bail out immediately.
So I don't see the problem here.
PM_POST_* is different because thermal_zone_device_resume() calls
__thermal_zone_device_update() when tz->state is TZ_STATE_READY and
that may take time.
> IIIUC, that issue is __thermal_zone_device_update() might take long
> thus block other thermal zones and other PM notifiers after thermal.
>
> Apparently, at least the latter also applies to PM_*_PREPARE?
Not at the point when the flush_workqueue() is called.
> Say, a currently running work item (i.e., that cancel_delayed_work()
> cannot cancel) wins the race for tz->lock and doesn't see tz->state
> TZ_STATE_FLAG_SUSPENDED set, so it runs, and say it might take long.
>
> Now, the workqueue flush blocks on it, also taking long, which thus
> blocks other PM notifiers.
>
> > The second failing scenario is addressed by adding a tz->state check
> > to thermal_zone_device_resume() to prevent it from reinitializing
> > the poll_queue delayed work if the thermal zone is going away.
>
> This also held OK in the thinking of corner cases.
Thanks for the feedback!