RE: [PATCH V1] mmc: core: Enable force hw reset
From: Sarthak Garg (QUIC)
Date: Thu Jun 02 2022 - 14:12:15 EST
> -----Original Message-----
> From: Adrian Hunter <adrian.hunter@xxxxxxxxx>
> Sent: Wednesday, June 1, 2022 11:35 AM
> To: Sarthak Garg (QUIC) <quic_sartgarg@xxxxxxxxxxx>; Kamasali Satyanarayan
> (Consultant) (QUIC) <quic_kamasali@xxxxxxxxxxx>; quic_spathi
> <quic_spathi@xxxxxxxxxxx>; ulf.hansson@xxxxxxxxxx; axboe@xxxxxxxxx;
> avri.altman@xxxxxxx; kch@xxxxxxxxxx; CLoehle@xxxxxxxxxxxxxx;
> swboyd@xxxxxxxxxxxx; digetx@xxxxxxxxx; bigeasy@xxxxxxxxxxxxx; linux-
> mmc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH V1] mmc: core: Enable force hw reset
>
> On 27/05/22 15:44, Adrian Hunter wrote:
> > On 25/05/22 10:06, Sarthak Garg (QUIC) wrote:
> >> Hi Adrian,
> >>
> >> Thanks for the review.
> >> Please find comments inline.
> >>
> >> Thanks,
> >> Sarthak
> >>
> >>> -----Original Message-----
> >>> From: Kamasali Satyanarayan (Consultant) (QUIC)
> >>> <quic_kamasali@xxxxxxxxxxx>
> >>> Sent: Tuesday, May 24, 2022 5:33 PM
> >>> To: 'Adrian Hunter' <adrian.hunter@xxxxxxxxx>; quic_spathi
> >>> <quic_spathi@xxxxxxxxxxx>; ulf.hansson@xxxxxxxxxx;
> >>> riteshh@xxxxxxxxxxxxxx; asutoshd@xxxxxxxxxxxxxx; axboe@xxxxxxxxx;
> >>> avri.altman@xxxxxxx; kch@xxxxxxxxxx; CLoehle@xxxxxxxxxxxxxx;
> >>> swboyd@xxxxxxxxxxxx; digetx@xxxxxxxxx; bigeasy@xxxxxxxxxxxxx;
> >>> linux-mmc@xxxxxxxxxxxxxxx; linux- kernel@xxxxxxxxxxxxxxx; Sarthak
> >>> Garg (QUIC) <quic_sartgarg@xxxxxxxxxxx>
> >>> Cc: Shaik Sajida Bhanu <sbhanu@xxxxxxxxxxxxxx>
> >>> Subject: RE: [PATCH V1] mmc: core: Enable force hw reset
> >>>
> >>> Hi,
> >>> These patches will be further taken by Sarthak.
> >>>
> >>> Thanks,
> >>> Satya
> >>>
> >>> -----Original Message-----
> >>> From: Adrian Hunter <adrian.hunter@xxxxxxxxx>
> >>> Sent: Wednesday, April 27, 2022 6:04 PM
> >>> To: quic_spathi <quic_spathi@xxxxxxxxxxx>; ulf.hansson@xxxxxxxxxx;
> >>> riteshh@xxxxxxxxxxxxxx; asutoshd@xxxxxxxxxxxxxx; axboe@xxxxxxxxx;
> >>> avri.altman@xxxxxxx; kch@xxxxxxxxxx; CLoehle@xxxxxxxxxxxxxx;
> >>> swboyd@xxxxxxxxxxxx; digetx@xxxxxxxxx; bigeasy@xxxxxxxxxxxxx; linux-
> >>> mmc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> >>> Cc: Shaik Sajida Bhanu <sbhanu@xxxxxxxxxxxxxx>; Kamasali
> >>> Satyanarayan
> >>> (Consultant) (QUIC) <quic_kamasali@xxxxxxxxxxx>
> >>> Subject: Re: [PATCH V1] mmc: core: Enable force hw reset
> >>>
> >>> On 26/04/22 11:30, Srinivasarao Pathipati wrote:
> >>>> From: Shaik Sajida Bhanu <sbhanu@xxxxxxxxxxxxxx>
> >>>>
> >>>> During error recovery set need hw reset to handle ICE error where
> >>>> cqe reset is must.
> >>>
> >>> How do you get ICE errors? Doesn't it mean either the hardware is
> >>> broken or the configuration is broken?
> >>
> >> This patch is not intended for ice errors and will update the commit text in V2.
> >> Long back intermittent recovery failures were observed but after forcing
> hardware reset during error recovery we have no single instance of recovery
> failure. This have made recovery more robust for us.
> >> Any suggestions on how we can take it forward will be highly appreciated.
> >
> > We can definitely go forward, but with hopefully a little more
> > explanation first.
> >
> > It is preferable to be able to explain why changes are being made.
> >
> > Do you have any logs or other information on the recovery failures?
> > Are you able to reproduce the problem?
> >
> > I notice you always do mmc_blk_reset_success(). Does that mean you
> > sometimes need several resets in a row?
> >
> > A potential issue that I notice, is that the recovery does not
> > explicitly deal with the case that the card's command queue has been
> > disabled e.g. due to RPMB access or IOCTL commands. Are you using
> > either of those?
>
> Looking closer, the command queue is reenabled on the error path so that is not
> a concern, but I did send:
>
> https://lore.kernel.org/linux-mmc/20220531171922.76080-1-
> adrian.hunter@xxxxxxxxx/
>
> For your case, if it is not about ICE errors, why not add only:
>
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index
> f4a1281658db..a2ee850a5c16 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -1497,7 +1497,7 @@ void mmc_blk_cqe_recovery(struct mmc_queue *mq)
> pr_debug("%s: CQE recovery start\n", mmc_hostname(host));
>
> err = mmc_cqe_recovery(host);
> - if (err)
> + if (err || host->cqe_recovery_reset_always)
> mmc_blk_reset(mq->blkdata, host,
> MMC_BLK_CQE_RECOVERY);
> mmc_blk_reset_success(mq->blkdata, MMC_BLK_CQE_RECOVERY);
>
> And then just set it in your host controller driver probe function.
>
> host->cqe_recovery_reset_always = true;
>
> >
Thanks Adrian for considering and posting https://lore.kernel.org/linux-mmc/20220531171922.76080-1-adrian.hunter@xxxxxxxxx/ and for the above suggestion as well.
Will soon post the updated patches.
> >>>
> >>>>
> >>>> Signed-off-by: Shaik Sajida Bhanu <sbhanu@xxxxxxxxxxxxxx>
> >>>> Signed-off-by: kamasali <quic_kamasali@xxxxxxxxxxx>
> >>>> Signed-off-by: Srinivasarao Pathipati <quic_spathi@xxxxxxxxxxx>
> >>>> ---
> >>>> drivers/mmc/core/block.c | 8 +++++---
> >>>> drivers/mmc/host/cqhci-core.c | 7 +++++--
> >>>> include/linux/mmc/host.h | 1 +
> >>>> 3 files changed, 11 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> >>>> index
> >>>> b35e7a9..f63bf33 100644
> >>>> --- a/drivers/mmc/core/block.c
> >>>> +++ b/drivers/mmc/core/block.c
> >>>> @@ -1482,10 +1482,12 @@ void mmc_blk_cqe_recovery(struct
> mmc_queue
> >>> *mq)
> >>>> pr_debug("%s: CQE recovery start\n", mmc_hostname(host));
> >>>>
> >>>> err = mmc_cqe_recovery(host);
> >>>> - if (err)
> >>>> + if (err || host->need_hw_reset) {
> >>>> mmc_blk_reset(mq->blkdata, host,
> >>> MMC_BLK_CQE_RECOVERY);
> >>>> - else
> >>>> - mmc_blk_reset_success(mq->blkdata,
> >>> MMC_BLK_CQE_RECOVERY);
> >>>> + if (host->need_hw_reset)
> >>>> + host->need_hw_reset = false;
> >>>> + }
> >>>> + mmc_blk_reset_success(mq->blkdata, MMC_BLK_CQE_RECOVERY);
> >>>>
> >>>> pr_debug("%s: CQE recovery done\n", mmc_hostname(host)); } diff
> >>>> --git a/drivers/mmc/host/cqhci-core.c
> >>>> b/drivers/mmc/host/cqhci-core.c index b0d30c3..311b510 100644
> >>>> --- a/drivers/mmc/host/cqhci-core.c
> >>>> +++ b/drivers/mmc/host/cqhci-core.c
> >>>> @@ -812,18 +812,21 @@ static void cqhci_finish_mrq(struct mmc_host
> >>>> *mmc, unsigned int tag) irqreturn_t cqhci_irq(struct mmc_host
> >>>> *mmc, u32
> >>> intmask, int cmd_error,
> >>>> int data_error)
> >>>> {
> >>>> - u32 status;
> >>>> + u32 status, ice_err;
> >>>> unsigned long tag = 0, comp_status;
> >>>> struct cqhci_host *cq_host = mmc->cqe_private;
> >>>>
> >>>> status = cqhci_readl(cq_host, CQHCI_IS);
> >>>> cqhci_writel(cq_host, status, CQHCI_IS);
> >>>> + ice_err = status & (CQHCI_IS_GCE | CQHCI_IS_ICCE);
> >>>>
> >>>> pr_debug("%s: cqhci: IRQ status: 0x%08x\n", mmc_hostname(mmc),
> >>>> status);
> >>>>
> >>>> if ((status & (CQHCI_IS_RED | CQHCI_IS_GCE | CQHCI_IS_ICCE)) ||
> >>>> - cmd_error || data_error)
> >>>> + cmd_error || data_error || ice_err){
> >>>> + mmc->need_hw_reset = true;
> >>>> cqhci_error_irq(mmc, status, cmd_error, data_error);
> >>>> + }
> >>>>
> >>>> if (status & CQHCI_IS_TCC) {
> >>>> /* read TCN and complete the request */ diff --git
> >>>> a/include/linux/mmc/host.h b/include/linux/mmc/host.h index
> >>>> c193c50..3d00bcf 100644
> >>>> --- a/include/linux/mmc/host.h
> >>>> +++ b/include/linux/mmc/host.h
> >>>> @@ -492,6 +492,7 @@ struct mmc_host {
> >>>> int cqe_qdepth;
> >>>> bool cqe_enabled;
> >>>> bool cqe_on;
> >>>> + bool need_hw_reset;
> >>>>
> >>>> /* Inline encryption support */
> >>>> #ifdef CONFIG_MMC_CRYPTO
> >>>
> >>
> >