On 10/26, Can Guo wrote:
On 2020-10-26 14:13, Jaegeuk Kim wrote:
> On 10/26, Can Guo wrote:
> > On 2020-10-24 23:06, Jaegeuk Kim wrote:
> > > From: Jaegeuk Kim <jaegeuk@xxxxxxxxxx>
> > >
> > > When giving a stress test which enables/disables clkgating, we hit
> > > device
> > > timeout sometimes. This patch avoids subtle racy condition to address
> > > it.
> > >
> > > If we use __ufshcd_release(), I've seen that gate_work can be called in
> > > parallel
> > > with ungate_work, which results in UFS timeout when doing hibern8.
> > > Should avoid it.
> > >
> >
> > I don't understand this comment. gate_work and ungate_work are
> > queued on
> > an ordered workqueue and an ordered workqueue executes at most one
> > work item
> > at any given time in the queued order. How can the two run in
> > parallel?
>
> When I hit UFS stuck, I saw this by clkgating tracepoint.
>
> - REQ_CLK_OFF
> - CLKS_OFF
> - REQ_CLK_OFF
> - REQ_CLKS_ON
> ..
>
I don't see how can you tell that the two works are running in parallel
just from above trace. May I know what is the exact error by "UFS timeout
when doing hibern8"?
By using __ufshcd_release() here, I do see one potential issue if your test
quickly toggles on/off of clk_gating - disable it, enable it, disable it and
enable it, which will cause that __ufshcd_release() being called twice,
meaning
we queue two gate_works back to back. So can you try below code and let me
know
if it helps or not? I am OK with your current change, but I would like to
understand the problem. Thanks.
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 1791bce..3eee438 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2271,6 +2271,8 @@ static void ufshcd_gate_work(struct work_struct *work)
unsigned long flags;
spin_lock_irqsave(hba->host->host_lock, flags);
+ if (hba->clk_gating.state == CLKS_OFF)
+ goto rel_lock;
/*
* In case you are here to cancel this work the gating state
* would be marked as REQ_CLKS_ON. In this case save time by
This doesn't help. So, I checked this back again, and, like what you said, now
suspect __ufshcd_release() which changed state to REQ_CLKS_OFF on CLKS_OFF.
With the below change, I can see the issue anymore. Let me send v4.
---
drivers/scsi/ufs/ufshcd.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index b8f573a02713..cc8d5f0c3fdc 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -1745,7 +1745,8 @@ static void __ufshcd_release(struct ufs_hba *hba)
if (hba->clk_gating.active_reqs || hba->clk_gating.is_suspended ||
hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL ||
ufshcd_any_tag_in_use(hba) || hba->outstanding_tasks ||
- hba->active_uic_cmd || hba->uic_async_done)
+ hba->active_uic_cmd || hba->uic_async_done ||
+ hba->clk_gating.state == CLKS_OFF)
return;
hba->clk_gating.state = REQ_CLKS_OFF;
--
2.29.0.rc1.297.gfa9743e501-goog
Regards,
Can Guo.
> By using active_req, I don't see any problem.
>
> >
> > Thanks,
> >
> > Can Guo.
> >
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@xxxxxxxxxx>
> > > ---
> > > drivers/scsi/ufs/ufshcd.c | 12 ++++++------
> > > 1 file changed, 6 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > > index b8f573a02713..e0b479f9eb8a 100644
> > > --- a/drivers/scsi/ufs/ufshcd.c
> > > +++ b/drivers/scsi/ufs/ufshcd.c
> > > @@ -1807,19 +1807,19 @@ static ssize_t
> > > ufshcd_clkgate_enable_store(struct device *dev,
> > > return -EINVAL;
> > >
> > > value = !!value;
> > > +
> > > + spin_lock_irqsave(hba->host->host_lock, flags);
> > > if (value == hba->clk_gating.is_enabled)
> > > goto out;
> > >
> > > - if (value) {
> > > - ufshcd_release(hba);
> > > - } else {
> > > - spin_lock_irqsave(hba->host->host_lock, flags);
> > > + if (value)
> > > + hba->clk_gating.active_reqs--;
> > > + else
> > > hba->clk_gating.active_reqs++;
> > > - spin_unlock_irqrestore(hba->host->host_lock, flags);
> > > - }
> > >
> > > hba->clk_gating.is_enabled = value;
> > > out:
> > > + spin_unlock_irqrestore(hba->host->host_lock, flags);
> > > return count;
> > > }