Re: [PATCH v4 1/8] scsi: ufs: Flush exception event before suspend

From: Bart Van Assche
Date: Sat Jan 25 2020 - 22:30:21 EST

Next message: kbuild test robot: "[rcu:cna.2020.01.23a] BUILD SUCCESS 7ca26fc6716ad864a235f3818b1e559d5e882eda"
Previous message: Woody Suwalski: "[PATCH] fix 3.16 unknown rela relocation 4 error"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2020-01-22 23:25, Can Guo wrote:
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 1201578..c2de29f 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -4760,8 +4760,15 @@ static void ufshcd_slave_destroy(struct scsi_device *sdev)
> * UFS device needs urgent BKOPs.
> */
> if (!hba->pm_op_in_progress &&
> - ufshcd_is_exception_event(lrbp->ucd_rsp_ptr))
> - schedule_work(&hba->eeh_work);
> + ufshcd_is_exception_event(lrbp->ucd_rsp_ptr)) {
> + /*
> + * Prevent suspend once eeh_work is scheduled
> + * to avoid deadlock between ufshcd_suspend
> + * and exception event handler.
> + */
> + if (schedule_work(&hba->eeh_work))
> + pm_runtime_get_noresume(hba->dev);
> + }

Please combine the two logical tests with "&&" instead of nesting two
if-statements inside each other.

> break;
> case UPIU_TRANSACTION_REJECT_UPIU:
> /* TODO: handle Reject UPIU Response */
> @@ -5215,7 +5222,14 @@ static void ufshcd_exception_event_handler(struct work_struct *work)
>
> out:
> scsi_unblock_requests(hba->host);
> - pm_runtime_put_sync(hba->dev);
> + /*
> + * pm_runtime_get_noresume is called while scheduling
> + * eeh_work to avoid suspend racing with exception work.
> + * Hence decrement usage counter using pm_runtime_put_noidle
> + * to allow suspend on completion of exception event handler.
> + */
> + pm_runtime_put_noidle(hba->dev);
> + pm_runtime_put(hba->dev);
> return;
> }
>
> @@ -7901,6 +7915,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
> goto enable_gating;
> }
>
> + flush_work(&hba->eeh_work);
> ret = ufshcd_link_state_transition(hba, req_link_state, 1);
> if (ret)
> goto set_dev_active;

I think this patch introduces a new race condition, namely the following:
- ufshcd_slave_destroy() tests pm_op_in_progress and reads the value
zero from that variable.
- ufshcd_suspend() sets hba->pm_op_in_progress to one.
- ufshcd_slave_destroy() calls schedule_work().

How about fixing this race condition by calling
pm_runtime_get_noresume() before checking pm_op_in_progress and by
reallowing resume if no work is scheduled?

Thanks,

Bart.

Next message: kbuild test robot: "[rcu:cna.2020.01.23a] BUILD SUCCESS 7ca26fc6716ad864a235f3818b1e559d5e882eda"
Previous message: Woody Suwalski: "[PATCH] fix 3.16 unknown rela relocation 4 error"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]