Re: mhi resume failure on reboot with 6.13-rc2

From: Johan Hovold
Date: Thu Dec 19 2024 - 03:36:40 EST


On Thu, Dec 19, 2024 at 12:05:55AM +0530, Manivannan Sadhasivam wrote:
> On Wed, Dec 18, 2024 at 03:26:38PM +0100, Johan Hovold wrote:
> > On Wed, Dec 18, 2024 at 07:39:10PM +0530, Manivannan Sadhasivam wrote:
> > > On Wed, Dec 18, 2024 at 02:55:02PM +0100, Johan Hovold wrote:

> > > > But that's not going to happen as that reset is what is currently
> > > > causing the deadlock and which would simply be skipped if you switch to
> > > > pci_try_reset_function().
> > > >
> > >
> > > mhi_pci_runtime_resume() will queue the recovery_work() and return. So I was
> > > hoping that by the time pci_try_reset_function() is called, the lock would be
> > > available.
> >
> > We can't rely on luck with timings, and this is the very reason for the
> > deadlock I'm currently seeing (i.e. the recovery thread is still running
> > when another thread grabs the lock and waits for the recovery thread to
> > finish).
> >
> > Perhaps the recovery work should be done synchronously in the resume
> > handler to avoid such issues.
>
> Synchronously? How can that help when the recovery_work() cannot acquire the
> lock?

During system suspend, pm core waits for any on-going runtime resume
operations to complete before taking the device lock and suspending the
device.

Unfortunately, that's currently not the case during shutdown() where
those operations are reversed, so that would indeed need to be addressed
first.

But what the driver is currently doing looks highly questionable as it
returns success when it failed to resume the device (after scheduling
the asynchronous recovery work).

Johan