Re: [PATCH v5] scsi: ufs: core: handle PM commands timeout before SCSI EH

From: Peter Wang (王信友)

Date: Mon Jun 08 2026 - 05:06:35 EST


On Fri, 2026-06-05 at 19:20 +0800, Hongjie Fang wrote:
> A PM START STOP sent from the UFS well-known LU resume path can race
> with
> SCSI EH.
>
> The "wl resume" task flow is:
>   __ufshcd_wl_resume()
>     ufshcd_set_dev_pwr_mode(UFS_ACTIVE_PWR_MODE)
>       ufshcd_execute_start_stop()
>         scsi_execute_cmd()
>           blk_execute_rq           <-- wait
>           scsi_check_passthrough() <-- may retry START STOP
>
> If the first START STOP time out, SCSI EH may already recover the
> link and
> reset the device before scsi_execute_cmd() returns:
>   scsi_timeout()
>     scsi_eh_scmd_add()
>       scsi_error_handler()
>         scsi_unjam_host()
>           scsi_eh_ready_devs()
>             scsi_eh_host_reset()
>               ufshcd_eh_host_reset_handler()
>                 if (hba->pm_op_in_progress)
>                   ufshcd_link_recovery()
>                     ufshcd_device_reset()
>                     ufshcd_host_reset_and_restore()
>           ...
>           scsi_eh_flush_done_q()   <-- wakeup "wl resume" task
>         ...                        <-- host still in SHOST_RECOVERY
>         scsi_restart_operations()
>
> A later passthrough retry can then run while the host is still in
> SHOST_RECOVERY and hit the SCMD_FAIL_IF_RECOVERING path:
>   scsi_queue_rq()
>     if (scsi_host_in_recovery(shost) &&
>         cmd->flags & SCMD_FAIL_IF_RECOVERING)
>       return BLK_STS_OFFLINE
>
> That retry completes with DID_ERROR or DID_NO_CONNECT even though EH
> may
> already have restored the device to an operational ACTIVE state.
>
> Handle these PM timeouts directly from ufshcd_eh_timed_out() instead.
> After ufshcd_link_recovery(), complete the timed-out command
> immediately
> if it has not been completed already.
>
> For regular SCSI commands, complete them with DID_REQUEUE to match
> the
> existing MCQ force-completion semantics and allow scsi_execute_cmd()
> to
> retry if needed. For reserved internal device-management commands,
> finish
> the request with DID_TIME_OUT without calling
> ufshcd_release_scsi_cmd()
> since those commands use different resource lifetime rules.
>
> The system_suspending flag is no longer needed because PM command
> timeout
> handling now uses pm_op_in_progress.
>
> Fixes: b8c3a7bac9b6 ("scsi: ufs: Have midlayer retry start stop
> errors")
> Signed-off-by: Hongjie Fang <hongjiefang@xxxxxxxxxxxx>
> ---

Thanks for fix this bug.
Reviewed-by: Peter Wang <peter.wang@xxxxxxxxxxxx>