Re: [PATCH v20 1/2] scsi: ufs: Enable power management for wlun

From: Adrian Hunter
Date: Fri Apr 23 2021 - 04:02:11 EST


On 23/04/21 9:18 am, Adrian Hunter wrote:
> On 23/04/21 7:23 am, Adrian Hunter wrote:
>> On 22/04/21 7:38 pm, Asutosh Das (asd) wrote:
>>> On 4/20/2021 12:42 AM, Adrian Hunter wrote:
>>>> On 20/04/21 7:15 am, Adrian Hunter wrote:
>>>>> On 20/04/21 12:53 am, Asutosh Das (asd) wrote:
>>>>>> On 4/19/2021 11:37 AM, Adrian Hunter wrote:
>>>>>>> On 16/04/21 10:49 pm, Asutosh Das wrote:
>>>>>>>>
>>>>>>>> Co-developed-by: Can Guo <cang@xxxxxxxxxxxxxx>
>>>>>>>> Signed-off-by: Can Guo <cang@xxxxxxxxxxxxxx>
>>>>>>>> Signed-off-by: Asutosh Das <asutoshd@xxxxxxxxxxxxxx>
>>>>>>>> ---
>>>>>>>
>>>>>>> I came across 3 issues while testing.  See comments below.
>>>>>>>
>>>>>> Hi Adrian
>>>>>> Thanks for the comments.
>>>>>>> <SNIP>
>>>>>>>
>>>>>>>> @@ -5794,7 +5839,7 @@ static void ufshcd_err_handling_unprepare(struct ufs_hba *hba)
>>>>>>>>        if (ufshcd_is_clkscaling_supported(hba))
>>>>>>>>            ufshcd_clk_scaling_suspend(hba, false);
>>>>>>>>        ufshcd_clear_ua_wluns(hba);
>>>>>>>
>>>>>>> ufshcd_clear_ua_wluns() deadlocks trying to clear UFS_UPIU_RPMB_WLUN
>>>>>>> if sdev_rpmb is suspended and sdev_ufs_device is suspending.
>>>>>>> e.g. ufshcd_wl_suspend() is waiting on host_sem while ufshcd_err_handler()
>>>>>>> is running, at which point sdev_rpmb has already suspended.
>>>>>>>
>>>>>> Umm, I didn't understand this deadlock.
>>>>>> When you say, sdev_rpmb is suspended, does it mean runtime_suspended?
>>>>>> sdev_ufs_device is suspending - this can't be runtime_suspending, while ufshcd_err_handling_unprepare is running.
>>>>>>
>>>>>> If you've a call-stack of this deadlock, please can you share it with me. I'll also try to reproduce this.
>>>>>
>>>>> Yes it is system suspend. sdev_rpmb has suspended, sdev_ufs_device is waiting on host_sem.
>>>>> ufshcd_err_handler() holds host_sem. ufshcd_clear_ua_wlun(UFS_UPIU_RPMB_WLUN) gets stuck.
>>>>> I will get some call-stacks.
>>>>
>>> Hi Adrian,
>>>
>>> Thanks for the call stacks.
>>> From the current information, I can't say for sure why it'd get stuck in blk_queue_enter().
>>
>> I presume SCSI is leaving the RPMB WLUN device runtime suspended and consequently the queue status is RPM_SUSPENDED
>>
>>>
>>> I tried reproducing this issue on my setup yesterday but couldn't.
>>> Here's what I did:
>>> 1. sdev_rpmb is RPM_SUSPENDED, checked before initiating system suspend
>>> 2. sdev_ufs_device is RPM_RESUMED
>>> 3. I triggered system suspend (echo mem > /sys/power/state) and scheduled the error handler from ufshcd_wl_suspend().
>>> 4. Waited until error handler ran and then ufshcd_wl_suspend() blocks on host_sem.
>>> 5. The ufshcd_clear_wa_wlun(UFS_UPIU_RPMB_WLUN) went through fine.
>>>
>>> Do you've some specific steps to reproduce this or a script, perhaps? If so, please can you share it with me. I will try again.
>>
>> I was using a device that gives occasional errors, but I will what see I can do.
>
> I have attached 2 patches to reproduce the issue, and the test script I used.
> Note it is using down_timeout() so it doesn't lock up completely.
> The kernel messages are:
>
> [ 79.385456] ufshcd_wl_suspend: doing ufshcd_schedule_eh_work
> [ 79.385511] ufshcd_wl_suspend: sleeping 1000 ms
> [ 79.386916] ufshcd_err_handler: start
> [ 79.386979] ufshcd_err_handler: got sem
> [ 79.387053] ufshcd_err_handling_unprepare: start
> [ 79.387302] ufshcd_clear_ua_wluns: before ufshcd_clear_ua_wlun(hba, UFS_UPIU_RPMB_WLUN)
> [ 80.435878] ufshcd_wl_suspend: trying to get sem
> [ 85.683876] ufshcd_wl_suspend: failed to get sem
> [ 85.683993] ufs_device_wlun 0:0:0:49488: PM: failed to suspend async: error -22
> [ 85.686901] ufshcd_clear_ua_wluns: after ufshcd_clear_ua_wlun(hba, UFS_UPIU_RPMB_WLUN), ret 0
> [ 85.687025] ufshcd_err_handling_unprepare: finish
> [ 85.687090] ufshcd_err_handler: finish
>
>

I think we also need to runtime resume RPMB WLUN before system suspend.
e.g.

+static int ufshcd_rpmb_rpm_get_sync(struct ufs_hba *hba)
+{
+ return pm_runtime_get_sync(&hba->sdev_rpmb->sdev_gendev);
+}
+
+static int ufshcd_rpmb_rpm_put(struct ufs_hba *hba)
+{
+ return pm_runtime_put(&hba->sdev_rpmb->sdev_gendev);
+}
+
void ufshcd_resume_complete(struct device *dev)
{
struct ufs_hba *hba = dev_get_drvdata(dev);

+ if (hba->rpmb_complete_put) {
+ hba->rpmb_complete_put = false;
+ ufshcd_rpmb_rpm_put(hba);
+ }
if (hba->complete_put) {
hba->complete_put = false;
ufshcd_rpm_put(hba);
@@ -9611,6 +9625,11 @@ int ufshcd_suspend_prepare(struct device *dev)
return ret;
}
hba->complete_put = true;
+
+ if (hba->sdev_rpmb) {
+ ufshcd_rpmb_rpm_get_sync(hba);
+ hba->rpmb_complete_put = true;
+ }
return 0;
}

That also avoids another issue: if RPMB WLUN is runtime suspended at system resume, we have to skip clearing UAC, but SCSI PM will force the runtime status to RPM_ACTIVE after system resume, so the UAC never gets cleared in that case.

Furthermore, it seems better not to report errors from RPMB resume and instead let the error handler sort it out.
So, with the above change, we can simplify a bit:

-static int ufshcd_rpmb_runtime_resume(struct device *dev)
-{
- struct ufs_hba *hba = wlun_dev_to_hba(dev);
-
- if (hba->sdev_rpmb)
- return ufshcd_clear_rpmb_uac(hba);
- return 0;
-}
-
static int ufshcd_rpmb_resume(struct device *dev)
{
struct ufs_hba *hba = wlun_dev_to_hba(dev);

- if (hba->sdev_rpmb && !pm_runtime_suspended(dev))
- return ufshcd_clear_rpmb_uac(hba);
+ if (hba->sdev_rpmb)
+ ufshcd_clear_rpmb_uac(hba);
return 0;
}

static const struct dev_pm_ops ufs_rpmb_pm_ops = {
- SET_RUNTIME_PM_OPS(NULL, ufshcd_rpmb_runtime_resume, NULL)
+ SET_RUNTIME_PM_OPS(NULL, ufshcd_rpmb_resume, NULL)
SET_SYSTEM_SLEEP_PM_OPS(NULL, ufshcd_rpmb_resume)
};