Re: [PATCH v7] scsi: ufs: Quiesce all scsi devices before shutdown

From: Can Guo
Date: Mon Aug 03 2020 - 08:06:23 EST

Slightly updated my comments

On 2020-08-03 19:50, Can Guo wrote:
Hi Stanley,

On 2020-08-03 18:04, Stanley Chu wrote:
Currently I/O request could be still submitted to UFS device while
UFS is working on shutdown flow. This may lead to racing as below
scenarios and finally system may crash due to unclocked register

To fix this kind of issues, in ufshcd_shutdown(),

1. Use pm_runtime_get_sync() instead of resuming UFS device by
ufshcd_runtime_resume() "internally" to let runtime PM framework
manage and prevent concurrent runtime operations by incoming I/O

2. Specifically quiesce all SCSI devices to block all I/O requests
after device is resumed.

Example of racing scenario: While UFS device is runtime-suspended

Thread #1: Executing UFS shutdown flow, e.g.,

Thread #2: Executing runtime resume flow triggered by I/O request,
e.g., ufshcd_resume(UFS_RUNTIME_PM)

This breaks the assumption that UFS PM flows can not be running
concurrently and some unexpected racing behavior may happen.

Signed-off-by: Stanley Chu <stanley.chu@xxxxxxxxxxxx>
- Since v6:
- Do quiesce to all SCSI devices.
- Since v4:
- Use pm_runtime_get_sync() instead of resuming UFS device by
ufshcd_runtime_resume() "internally".
drivers/scsi/ufs/ufshcd.c | 27 ++++++++++++++++++++++-----
1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 307622284239..7cb220b3fde0 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -8640,6 +8640,7 @@ EXPORT_SYMBOL(ufshcd_runtime_idle);
int ufshcd_shutdown(struct ufs_hba *hba)
int ret = 0;
+ struct scsi_target *starget;

if (!hba->is_powered)
goto out;
@@ -8647,11 +8648,27 @@ int ufshcd_shutdown(struct ufs_hba *hba)
if (ufshcd_is_ufs_dev_poweroff(hba) && ufshcd_is_link_off(hba))
goto out;

- if (pm_runtime_suspended(hba->dev)) {
- ret = ufshcd_runtime_resume(hba);
- if (ret)
- goto out;
- }
+ /*
+ * Let runtime PM framework manage and prevent concurrent runtime
+ * operations with shutdown flow.
+ */
+ pm_runtime_get_sync(hba->dev);
+ /*
+ * Quiesce all SCSI devices to prevent any non-PM requests sending
+ * from block layer during and after shutdown.
+ *
+ * Here we can not use blk_cleanup_queue() since PM requests
+ * (with BLK_MQ_REQ_PREEMPT flag) are still required to be sent
+ * through block layer. Therefore SCSI command queued after the
+ * scsi_target_quiesce() call returned will block until
+ * blk_cleanup_queue() is called.
+ *
+ * Besides, scsi_target_"un"quiesce (e.g., scsi_target_resume) can
+ * be ignored since shutdown is one-way flow.
+ */
+ list_for_each_entry(starget, &hba->host->__targets, siblings)
+ scsi_target_quiesce(starget);

Sorry for misleading you to scsi_target_quiesce(), maybe below is better.

shost_for_each_device(sdev, hba->host)

We may need to discuss more about this quiesce part since I missed something.

After we quiesce the scsi devices, only PM requests are allowed, but it
is still not safe: [1] PM requests can still pass through, [2] there can
be tasks/reqs present in doorbells before the devices are quiesced. So,
these tasks/reqs in [1] and [2] can still be flying in parallel while
ufshcd_suspend is running.

How about only quiescing the UFS device well known scsi device but using
freeze_queue to the other scsi devices? blk_mq_freeze_queue can eliminate
the risks mentioned in [1] and [2].

shost_for_each_device(sdev, hba->host) {
if (sdev == hba->sdev_ufs_device)

IF blk_mq_freeze_queue is not allowed to be used by LLD (I think we can
use it as I recalled Bart used to use it in one of his changes to UFS scaling),
we can use scsi_remove_device instead, it changes scsi device's state to
SDEV_DEL and calls blk_cleanup_queue.

We can also make changes like below. [1] is to make sure no more PM requests
sent to scsi devices, [2] is make sure doorbells are cleared before invoke

shost_for_each_device(sdev, hba->host) {
scsi_autopm_get_device(sdev); [1]

ufshcd_wait_for_doorbell_clr(hba, U64_MAX); [2]

Please let me know which one you prefer or if you have better idea, thanks!


Can Guo.

ret = ufshcd_suspend(hba, UFS_SHUTDOWN_PM);