Re: [PATCH v2] scsi: core: pair EH runtime PM get/put
From: Alan Stern
Date: Mon May 25 2026 - 11:12:28 EST
On Mon, May 25, 2026 at 02:45:56PM +0800, Hongjie Fang wrote:
> shost->eh_noresume is currently consulted twice in one error handling
> iteration: once before scsi_autopm_get_host() and once again before
> scsi_autopm_put_host().
>
> That is racy when a PM-triggered error path flips shost->eh_noresume while
> the SCSI EH thread is still running.
>
> The problem flow looks like this:
> PM path
> ufshcd_set_dev_pwr_mode()
> shost->eh_noresume = 1
> ufshcd_execute_start_stop <-- trigger EH
> ...
> shost->eh_noresume = 0
>
> EH path
> scsi_error_handler()
> if (!shost->eh_noresume)
> scsi_autopm_get_host() <-- skipped
> ...
> if (!shost->eh_noresume)
> scsi_autopm_put_host() <-- executed later
>
> In that case one EH iteration can skip autoresume on entry and still drop
> a runtime PM reference on exit. That leaves an unmatched runtime PM put
> and can trigger a runtime PM usage count underflow.
>
> Fix this by calling scsi_autopm_put_host() only if the same iteration
> successfully acquired a runtime PM reference through the
> scsi_autopm_get_host().
>
> Fixes: ae0751ffc77e ("[SCSI] add flag to skip the runtime PM calls on the host")
> Signed-off-by: Hongjie Fang <hongjiefang@xxxxxxxxxxxx>
Acked-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
> ---
> drivers/scsi/scsi_error.c | 19 ++++++++++++-------
> 1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 147127fb4db9..faa5cc818e5b 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -2342,6 +2342,7 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
> int scsi_error_handler(void *data)
> {
> struct Scsi_Host *shost = data;
> + bool autopm_get;
>
> /*
> * We use TASK_INTERRUPTIBLE so that the thread is not
> @@ -2383,12 +2384,16 @@ int scsi_error_handler(void *data)
> * what we need to do to get it up and online again (if we can).
> * If we fail, we end up taking the thing offline.
> */
> - if (!shost->eh_noresume && scsi_autopm_get_host(shost) != 0) {
> - SCSI_LOG_ERROR_RECOVERY(1,
> - shost_printk(KERN_ERR, shost,
> - "scsi_eh_%d: unable to autoresume\n",
> - shost->host_no));
> - continue;
> + autopm_get = false;
> + if (!shost->eh_noresume) {
> + if (scsi_autopm_get_host(shost) != 0) {
> + SCSI_LOG_ERROR_RECOVERY(1,
> + shost_printk(KERN_ERR, shost,
> + "scsi_eh_%d: unable to autoresume\n",
> + shost->host_no));
> + continue;
> + }
> + autopm_get = true;
> }
>
> if (shost->transportt->eh_strategy_handler)
> @@ -2407,7 +2412,7 @@ int scsi_error_handler(void *data)
> * which are still online.
> */
> scsi_restart_operations(shost);
> - if (!shost->eh_noresume)
> + if (autopm_get)
> scsi_autopm_put_host(shost);
> }
> __set_current_state(TASK_RUNNING);
> --
> 2.25.1
>