On 2020-04-29 22:40, Can Guo wrote:
On 2020-04-30 13:08, Bart Van Assche wrote:
On 2020-04-29 21:10, Can Guo wrote:
During system resume, scsi_resume_device() decreases a request queue's
pm_only counter if the scsi device was quiesced before. But after that,
if the scsi device's RPM status is RPM_SUSPENDED, the pm_only counter is
still held (non-zero). Current scsi resume hook only sets the RPM status
of the scsi device and its request queue to RPM_ACTIVE, but leaves the
pm_only counter unchanged. This may make the request queue's pm_only
counter remain non-zero after resume hook returns, hence those who are
waiting on the mq_freeze_wq would never be woken up. Fix this by calling
blk_post_runtime_resume() if pm_only is non-zero to balance the pm_only
counter which is held by the scsi device's RPM ops.
How was this issue discovered? How has this patch been tested?
As the issue was found after system resumes, so the issue was discovered
during system suspend/resume test, and it is very easy to be replicated.
After system resumes, if this issue hits some scsi devices, all bios sent
to their request queues are blocked, which may cause a system hang if the
scsi devices are vital to system functionality.
To make sure the patch work well, we have tested system suspend/resume
and made sure no system hang happen due to request queues got blocked
by imbalanced pm_only counter.
Thanks, that's very interesting information. My concern with this patch
is that the power management code is not the only caller of
blk_set_pm_only() / blk_clear_pm_only(). E.g. the SCSI SPI code also
calls scsi_device_quiesce() and scsi_device_resume(). These last
functions call blk_set_pm_only() and blk_clear_pm_only(). More calls of
scsi_device_quiesce() and scsi_device_resume() might be added in the future.
Has it been considered to test directly whether a SCSI device has been
runtime suspended instead of relying on blk_queue_pm_only()? How about
using pm_runtime_status_suspended() or adding a function in
block/blk-pm.h that checks whether q->rpm_status == RPM_SUSPENDED?
Thanks,
Bart.