Re: [PATCH v4] scsi: ufs: Quiesce all scsi devices before shutdown
From: Stanley Chu
Date: Fri Jul 31 2020 - 05:27:36 EST
Hi Can,
On Fri, 2020-07-31 at 16:58 +0800, Can Guo wrote:
> Hi Stanley,
>
> On 2020-07-31 16:22, Stanley Chu wrote:
> > Hi Can,
> >
> > On Mon, 2020-07-27 at 15:30 +0800, Can Guo wrote:
> >> Hi Stanley,
> >>
> >> On 2020-07-24 22:01, Stanley Chu wrote:
> >> > Currently I/O request could be still submitted to UFS device while
> >> > UFS is working on shutdown flow. This may lead to racing as below
> >> > scenarios and finally system may crash due to unclocked register
> >> > accesses.
> >> >
> >> > To fix this kind of issues, specifically quiesce all SCSI devices
> >> > before UFS shutdown to block all I/O request sending from block
> >> > layer.
> >> >
> >> > Example of racing scenario: While UFS device is runtime-suspended
> >> >
> >> > Thread #1: Executing UFS shutdown flow, e.g.,
> >> > ufshcd_suspend(UFS_SHUTDOWN_PM)
> >> > Thread #2: Executing runtime resume flow triggered by I/O request,
> >> > e.g., ufshcd_resume(UFS_RUNTIME_PM)
> >> >
> >>
> >> I don't quite get it, how can you prevent block layer PM from iniating
> >> hba runtime resume by quiescing the scsi devices? Block layer PM
> >> iniates hba async runtime resume in blk_queue_enter(). But quiescing
> >> the scsi devices can only prevent general I/O requests from passing
> >> through scsi_queue_rq() callback.
> >>
> >> Say hba is runtime suspended, if an I/O request to sda is sent from
> >> block layer (sda must be runtime suspended as well at this time),
> >> blk_queue_enter() initiates async runtime resume for sda. But since
> >> sda's parents are also runtime suspended, the RPM framework shall do
> >> runtime resume to the devices in the sequence hba->host->target->sda.
> >> In this case, ufshcd_resume() still runs concurrently, no?
> >>
> >
> > You are right. This patch can not fix the case you mentioned. It just
> > prevents "general I/O requests".
> >
> > So perhaps we also need below patch?
> >
> > #2 scsi: ufs: Use pm_runtime_get_sync in shutdown flow
> > https://patchwork.kernel.org/patch/10964097/
>
> That is what I am talking about, we definitely need this. Since
> you are already working on the fixes to the shutdown path, I will
> not upload my fixes (basically look same with yours). However, as
> regard for the new change, if pm_runtime_get_sync(hba->dev) < 0,
> hba can still be runtime ACTIVE, why directly goto out without a
> check of hba's runtime status?
>
Thanks for reminding this. Then I will fix it and resend both patches as
a new series to fix the shutdown path.
Thanks so much,
Stanley Chu