Re: [PATCH 6.6 1/1] PM: sleep: Restore asynchronous device resume optimization

From: Greg KH
Date: Thu Sep 05 2024 - 05:38:46 EST


On Thu, Sep 05, 2024 at 05:34:33PM +0800, Yenchia Chen wrote:
> >> From: "Rafael J. Wysocki" <rafael.j.wysocki@xxxxxxxxx>
> >>
> >> commit 3e999770ac1c7c31a70685dd5b88e89473509e9c upstream.
> >>
> >> Before commit 7839d0078e0d ("PM: sleep: Fix possible deadlocks in core
> >> system-wide PM code"), the resume of devices that were allowed to resume
> >> asynchronously was scheduled before starting the resume of the other
> >> devices, so the former did not have to wait for the latter unless
> >> functional dependencies were present.
> >>
> >> Commit 7839d0078e0d removed that optimization in order to address a
> >> correctness issue, but it can be restored with the help of a new device
> >> power management flag, so do that now.
> >>
> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@xxxxxxxxxxxxxxx>
> >> Signed-off-by: Yenchia Chen <yenchia.chen@xxxxxxxxxxxx>
> >> ---
> >> drivers/base/power/main.c | 117 +++++++++++++++++++++-----------------
> >> include/linux/pm.h | 1 +
> >> 2 files changed, 65 insertions(+), 53 deletions(-)
>
> >Why does this need to be backported? What bug is it fixing?
>
> >confused,
>
> >greg k-h
>
> Below is the scenario we met the issue:
> 1) use command 'echo 3 > /proc/sys/vm/drop_caches'
> and enter suspending stage immediately.
> 2) power on device, our driver try to read mmc after leaving resume callback
> and got stucked.
>
> We found if we did not drop caches, mmc_blk_resume will be called and
> system works fine.
>
> If we drop caches before suspending, there is a high possibility that
> mmc_blk_resume not be called and our driver stucked at filp_open.
>
> We still try to find the root casue is but with this patch, it works.

I think you are getting lucky as this is just changing the order in
which things are suspending. Please find and fix the root problem.

> Since it has been merged in mainline, we'd like to know it is ok to merge to stable.

It changes the behavior of the system overall, and doesn't really fix a
bug on its own, so I don't want to, sorry.

Please find the real problem in your driver, or the mmc subsystem.

thanks,

greg k-h