Re: [PATCH] scsi: sd: add runtime pm to open / release

From: Alan Stern
Date: Fri Jun 26 2020 - 11:44:44 EST


On Fri, Jun 26, 2020 at 08:07:51AM -0700, Bart Van Assche wrote:
> On 2020-06-25 01:16, Martin Kepplinger wrote:
> > here's roughly what happens when enabling runtime PM in sysfs (again,
> > because sd_probe() calls autopm_put() and thus allows it:
> >
> > [ 27.384446] sd 0:0:0:0: scsi_runtime_suspend
> > [ 27.432282] blk_pre_runtime_suspend
> > [ 27.435783] sd_suspend_common
> > [ 27.438782] blk_post_runtime_suspend
> > [ 27.442427] scsi target0:0:0: scsi_runtime_suspend
> > [ 27.447303] scsi host0: scsi_runtime_suspend
> >
> > then I "mount /dev/sda1 /mnt" and none of the resume() functions get
> > called. To me it looks like the sd driver should initiate resuming, and
> > that's not implemented.
> >
> > what am I doing wrong or overlooking? how exactly does (or should) the
> > block layer initiate resume here?
>
> As far as I know runtime power management support in the sd driver is working
> fine and is being used intensively by the UFS driver. The following commit was
> submitted to fix a bug encountered by an UFS developer: 05d18ae1cc8a ("scsi:
> pm: Balance pm_only counter of request queue during system resume") # v5.7.

I just looked at that commit for the first time.

Instead of making the SCSI driver do the work of deciding what routine to
call, why not redefine blk_set_runtime_active(q) to simply call
blk_post_runtime_resume(q, 0)? Or vice versa: if err == 0 have
blk_post_runtime_resume call blk_set_runtime_active?

After all, the two routines do almost the same thing -- and the bug
addressed by this commit was caused by the difference in their behaviors.

If the device was already runtime-active during the system suspend, doing
an extra clear of the pm_only counter won't hurt anything.

> I'm not sure which bug is causing trouble on your setup but I think it's likely
> that the root cause is somewhere else than in the block layer, the SCSI core
> or the SCSI sd driver.
>
> Bart.

Martin's best approach would be to add some debugging code to find out why
blk_queue_enter() isn't calling bkl_pm_request_resume(), or why that call
doesn't lead to pm_request_resume().

Alan Stern