Re: [PATCH V1] mmc: sdhci-pci-gli: GL975[05]: Mask the replay timer timeout of AER
From: Kai-Heng Feng
Date: Wed Oct 11 2023 - 02:35:03 EST
On Fri, Oct 6, 2023 at 6:30 PM Victor Shih <victorshihgli@xxxxxxxxx> wrote:
>
> On Mon, Oct 2, 2023 at 10:18 AM Kai-Heng Feng
> <kai.heng.feng@xxxxxxxxxxxxx> wrote:
> >
> > Hi Victor,
> >
> > On Tue, Sep 26, 2023 at 4:21 PM Victor Shih <victorshihgli@xxxxxxxxx> wrote:
> > >
> > > On Fri, Sep 22, 2023 at 3:11 PM Kai-Heng Feng
> > > <kai.heng.feng@xxxxxxxxxxxxx> wrote:
> > > >
> > > > Hi Victor,
> > > >
> > > > On Wed, Sep 20, 2023 at 4:54 PM Victor Shih <victorshihgli@xxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, Sep 19, 2023 at 3:31 PM Kai-Heng Feng
> > > > > <kai.heng.feng@xxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > Hi Victor,
> > > > > >
> > > > > > On Tue, Sep 19, 2023 at 3:10 PM Victor Shih <victorshihgli@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Tue, Sep 19, 2023 at 12:24 PM Kai-Heng Feng
> > > > > > > <kai.heng.feng@xxxxxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > Hi Victor,
> > > > > > > >
> > > > > > > > On Mon, Sep 18, 2023 at 6:31 PM Victor Shih <victorshihgli@xxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > > From: Victor Shih <victor.shih@xxxxxxxxxxxxxxxxxxx>
> > > > > > > > >
> > > > > > > > > Due to a flaw in the hardware design, the GL975x replay timer frequently
> > > > > > > > > times out when ASPM is enabled. As a result, the system will resume
> > > > > > > > > immediately when it enters suspend. Therefore, the replay timer
> > > > > > > > > timeout must be masked.
> > > > > > > >
> > > > > > > > This patch solves AER error when its PCI config gets accessed, but the
> > > > > > > > AER still happens at system suspend:
> > > > > > > >
> > > > > > > > [ 1100.103603] ACPI: EC: interrupt blocked
> > > > > > > > [ 1100.268244] ACPI: EC: interrupt unblocked
> > > > > > > > [ 1100.326960] pcieport 0000:00:1c.0: AER: Corrected error received:
> > > > > > > > 0000:00:1c.0
> > > > > > > > [ 1100.326991] pcieport 0000:00:1c.0: PCIe Bus Error:
> > > > > > > > severity=Corrected, type=Data Link Layer, (Transmitter ID)
> > > > > > > > [ 1100.326993] pcieport 0000:00:1c.0: device [8086:7ab9] error
> > > > > > > > status/mask=00001000/00002000
> > > > > > > > [ 1100.326996] pcieport 0000:00:1c.0: [12] Timeout
> > > > > > > >
> > > > > > > > Kai-Heng
> > > > > > > >
> > > > > > >
> > > > > > > Hi, Kai-Heng
> > > > > > >
> > > > > > > Could you try applying the patch and re-testing again after restarting
> > > > > > > the system?
> > > > > >
> > > > > > Same issue happens after coldboot.
> > > > > >
> > > > > > > Because I applied the patch and restarted the system and it didn't happen.
> > > > > > > The system can enter suspend normally.
> > > > > > >
> > > > > > > If you still have the issue after following the above instructions,
> > > > > > > please provide me with your environment and I will verify it again.
> > > > > >
> > > > > > The patch gets applied on top of next-20230918. Please let me know
> > > > > > what else you want to know.
> > > > > >
> > > > > > Kai-Heng
> > > > > >
> > > > >
> > > > > Hi, Kai-Heng
> > > > >
> > > > > If I want to mask the replay timer timeout AER of the upper layer root port,
> > > > > could you give me some suggestions?
> > > > > Or could you provide sample code for my reference?
> > > >
> > > > I am not aware of anyway to mask "replay timer timeout" from root port.
> > > > I wonder if the device supoprt D3hot? Or should it stay at D0 when
> > > > ASPM L1.2 is enabled?
> > > >
> > > > Kai-Heng
> > > >
> > >
> > > Hi, Kai-Heng
> > >
> > > Do you know any way to mask the replay timer timeout AER of the
> > > upstream port from the device?
> >
> > Per PCIe Spec, I don't think it's possible to only mask 'replay timer timeout'.
> >
> > > The device supports D3hot.
> >
> > Do you think such error plays any crucial rule? Otherwise disable
> > 'correctable' errors may be plausible.
> >
> > Kai-Heng
> >
>
> Hi, Kai-Heng
>
> Due to a flaw in the hardware design, the GL975x replay timer frequently
> times out when ASPM is enabled.
> This patch solves the AER error of the replay timer timeout for GL975x.
> We have not encountered any other errors so far.
On the system I tested, this patch reduces the occurrence of the
error, but not completely eliminated.
> Does your 'correctable' errors mean the AER error of the replay timer timeout?
> May I ask if you have any other comments on this patch?
Spamming `lspci -vv -s` on the device can still observe the AER error.
I think the "correctable" mask should be optional, let me send a patch
to PCI for comment.
Kai-Heng
>
> Thanks, Victor Shih
>
> > >
> > > Thanks, Victor Shih
> > >
> > > > >
> > > > > Thanks, Victor Shih
> > > > >
> > > > > > >
> > > > > > > Thanks, Victor Shih
> > > > > > >
> > > > > > > > >
> > > > > > > > > Signed-off-by: Victor Shih <victor.shih@xxxxxxxxxxxxxxxxxxx>
> > > > > > > > > ---
> > > > > > > > > drivers/mmc/host/sdhci-pci-gli.c | 16 ++++++++++++++++
> > > > > > > > > 1 file changed, 16 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/mmc/host/sdhci-pci-gli.c b/drivers/mmc/host/sdhci-pci-gli.c
> > > > > > > > > index d83261e857a5..d8a991b349a8 100644
> > > > > > > > > --- a/drivers/mmc/host/sdhci-pci-gli.c
> > > > > > > > > +++ b/drivers/mmc/host/sdhci-pci-gli.c
> > > > > > > > > @@ -28,6 +28,9 @@
> > > > > > > > > #define PCI_GLI_9750_PM_CTRL 0xFC
> > > > > > > > > #define PCI_GLI_9750_PM_STATE GENMASK(1, 0)
> > > > > > > > >
> > > > > > > > > +#define PCI_GLI_9750_CORRERR_MASK 0x214
> > > > > > > > > +#define PCI_GLI_9750_CORRERR_MASK_REPLAY_TIMER_TIMEOUT BIT(12)
> > > > > > > > > +
> > > > > > > > > #define SDHCI_GLI_9750_CFG2 0x848
> > > > > > > > > #define SDHCI_GLI_9750_CFG2_L1DLY GENMASK(28, 24)
> > > > > > > > > #define GLI_9750_CFG2_L1DLY_VALUE 0x1F
> > > > > > > > > @@ -152,6 +155,9 @@
> > > > > > > > > #define PCI_GLI_9755_PM_CTRL 0xFC
> > > > > > > > > #define PCI_GLI_9755_PM_STATE GENMASK(1, 0)
> > > > > > > > >
> > > > > > > > > +#define PCI_GLI_9755_CORRERR_MASK 0x214
> > > > > > > > > +#define PCI_GLI_9755_CORRERR_MASK_REPLAY_TIMER_TIMEOUT BIT(12)
> > > > > > > > > +
> > > > > > > > > #define SDHCI_GLI_9767_GM_BURST_SIZE 0x510
> > > > > > > > > #define SDHCI_GLI_9767_GM_BURST_SIZE_AXI_ALWAYS_SET BIT(8)
> > > > > > > > >
> > > > > > > > > @@ -561,6 +567,11 @@ static void gl9750_hw_setting(struct sdhci_host *host)
> > > > > > > > > value &= ~PCI_GLI_9750_PM_STATE;
> > > > > > > > > pci_write_config_dword(pdev, PCI_GLI_9750_PM_CTRL, value);
> > > > > > > > >
> > > > > > > > > + /* mask the replay timer timeout of AER */
> > > > > > > > > + pci_read_config_dword(pdev, PCI_GLI_9750_CORRERR_MASK, &value);
> > > > > > > > > + value |= PCI_GLI_9750_CORRERR_MASK_REPLAY_TIMER_TIMEOUT;
> > > > > > > > > + pci_write_config_dword(pdev, PCI_GLI_9750_CORRERR_MASK, value);
> > > > > > > > > +
> > > > > > > > > gl9750_wt_off(host);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > @@ -770,6 +781,11 @@ static void gl9755_hw_setting(struct sdhci_pci_slot *slot)
> > > > > > > > > value &= ~PCI_GLI_9755_PM_STATE;
> > > > > > > > > pci_write_config_dword(pdev, PCI_GLI_9755_PM_CTRL, value);
> > > > > > > > >
> > > > > > > > > + /* mask the replay timer timeout of AER */
> > > > > > > > > + pci_read_config_dword(pdev, PCI_GLI_9755_CORRERR_MASK, &value);
> > > > > > > > > + value |= PCI_GLI_9755_CORRERR_MASK_REPLAY_TIMER_TIMEOUT;
> > > > > > > > > + pci_write_config_dword(pdev, PCI_GLI_9755_CORRERR_MASK, value);
> > > > > > > > > +
> > > > > > > > > gl9755_wt_off(pdev);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > > >