Re: [PATCH 0/2] PM: runtime: Fix potential I/O hang

From: Rafael J. Wysocki

Date: Wed Nov 26 2025 - 12:22:04 EST


On Wed, Nov 26, 2025 at 5:59 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>
> On Wed, Nov 26, 2025 at 4:48 PM Bart Van Assche <bvanassche@xxxxxxx> wrote:
> >
> > On 11/26/25 3:31 AM, Rafael J. Wysocki wrote:
> > > Please address the issue differently.
> >
> > It seems unfortunate to me that __pm_runtime_barrier() can cause pm_request_resume() to hang.
>
> I wouldn't call it a hang.
>
> __pm_runtime_barrier() removes the work item queued by
> pm_request_resume(), but at the time when it is called, which is
> device_suspend_late(), the work item queued by pm_request_resume()
> cannot make progress anyway. It will only be able to make progress
> when the PM workqueue is unfrozen at the end of the system resume
> transition.
>
> > Would it be safe to remove the
> > cancel_work_sync() call from __pm_runtime_barrier() since
> > pm_runtime_work() calls functions that check disable_depth
> > when processing RPM_REQ_SUSPEND and RPM_REQ_AUTOSUSPEND? Would
> > this be sufficient to fix the reported deadlock?
>
> If you want the resume work item to survive the system suspend/resume
> cycle, __pm_runtime_disable() may be changed to make that happen, but
> this still will not allow the work to make progress until the system
> resume ends.
>
> I'm not sure if this would help to address the issue at hand though.

I actually have a better idea: Why don't we resume all devices that
have runtime resume work items pending at the time when
device_suspend() is called?

Arguably, somebody wanted them to runtime-resume, so they should be
resumed before being prepared for system suspend and that will
eliminate the issue at hand (because devices cannot suspend during
system suspend/resume).