Re: [PATCH v5] scsi: ufs: core: handle PM commands timeout before SCSI EH
From: Peter Wang (王信友)
Date: Mon Jun 08 2026 - 05:06:35 EST
On Fri, 2026-06-05 at 19:20 +0800, Hongjie Fang wrote:
> A PM START STOP sent from the UFS well-known LU resume path can race
> with
> SCSI EH.
>
> The "wl resume" task flow is:
> __ufshcd_wl_resume()
> ufshcd_set_dev_pwr_mode(UFS_ACTIVE_PWR_MODE)
> ufshcd_execute_start_stop()
> scsi_execute_cmd()
> blk_execute_rq <-- wait
> scsi_check_passthrough() <-- may retry START STOP
>
> If the first START STOP time out, SCSI EH may already recover the
> link and
> reset the device before scsi_execute_cmd() returns:
> scsi_timeout()
> scsi_eh_scmd_add()
> scsi_error_handler()
> scsi_unjam_host()
> scsi_eh_ready_devs()
> scsi_eh_host_reset()
> ufshcd_eh_host_reset_handler()
> if (hba->pm_op_in_progress)
> ufshcd_link_recovery()
> ufshcd_device_reset()
> ufshcd_host_reset_and_restore()
> ...
> scsi_eh_flush_done_q() <-- wakeup "wl resume" task
> ... <-- host still in SHOST_RECOVERY
> scsi_restart_operations()
>
> A later passthrough retry can then run while the host is still in
> SHOST_RECOVERY and hit the SCMD_FAIL_IF_RECOVERING path:
> scsi_queue_rq()
> if (scsi_host_in_recovery(shost) &&
> cmd->flags & SCMD_FAIL_IF_RECOVERING)
> return BLK_STS_OFFLINE
>
> That retry completes with DID_ERROR or DID_NO_CONNECT even though EH
> may
> already have restored the device to an operational ACTIVE state.
>
> Handle these PM timeouts directly from ufshcd_eh_timed_out() instead.
> After ufshcd_link_recovery(), complete the timed-out command
> immediately
> if it has not been completed already.
>
> For regular SCSI commands, complete them with DID_REQUEUE to match
> the
> existing MCQ force-completion semantics and allow scsi_execute_cmd()
> to
> retry if needed. For reserved internal device-management commands,
> finish
> the request with DID_TIME_OUT without calling
> ufshcd_release_scsi_cmd()
> since those commands use different resource lifetime rules.
>
> The system_suspending flag is no longer needed because PM command
> timeout
> handling now uses pm_op_in_progress.
>
> Fixes: b8c3a7bac9b6 ("scsi: ufs: Have midlayer retry start stop
> errors")
> Signed-off-by: Hongjie Fang <hongjiefang@xxxxxxxxxxxx>
> ---
Thanks for fix this bug.
Reviewed-by: Peter Wang <peter.wang@xxxxxxxxxxxx>