Re: 5.7 regression: Lots of PCIe AER errors and suspend failure without pcie=noaer

From: Kai-Heng Feng
Date: Fri Jul 24 2020 - 10:32:14 EST

Hi Robert,

> On Jul 22, 2020, at 07:55, Robert Hancock <hancockrwd@xxxxxxxxx> wrote:
> On Fri, Jul 10, 2020 at 6:28 PM Robert Hancock <hancockrwd@xxxxxxxxx> wrote:
>> On Fri, Jul 10, 2020 at 6:23 PM Robert Hancock <hancockrwd@xxxxxxxxx> wrote:
>>> Noticed a problem on my desktop with an Asus PRIME H270-PRO
>>> motherboard after Fedora 32 upgraded to the 5.7 kernel (now on 5.7.8):
>>> periodically there are PCIe AER errors getting spewed in dmesg that
>>> weren't happening before, and this also seems to causes suspend to
>>> fail - the system just wakes back up again right away, I am assuming
>>> due to some AER errors interrupting the process. 5.6 kernels didn't
>>> have this problem. Setting "pcie=noaer" on the kernel command line
>>> works around the issue, but I'm not sure what would have changed to
>>> trigger this to occur?
>> Correction: the workaround option is "pci=noaer".
> As a follow-up, from some more experimentation, it appears that
> disabling PCIe ASPM with setpci on both the ASMedia PCIe-PCI bridge as
> well as the PCIe root port it is connected to seems to silence the AER
> errors and allow suspend/resume to work again:
> setpci -s 00:1c.0 0x50.B=0x00
> setpci -s 02:00.0 0x90.B=0x00
> It appears the behavior changed as a result of this patch (which went
> into the stable tree for 5.7.6 and so affects 5.7 kernels as well):
> commit 66ff14e59e8a30690755b08bc3042359703fb07a
> Author: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
> Date: Wed May 6 01:34:21 2020 +0800
> PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges
> 7d715a6c1ae5 ("PCI: add PCI Express ASPM support") added the ability for
> Linux to enable ASPM, but for some undocumented reason, it didn't enable
> ASPM on links where the downstream component is a PCIe-to-PCI/PCI-X Bridge.
> Remove this exclusion so we can enable ASPM on these links.
> The Dell OptiPlex 7080 mentioned in the bugzilla has a TI XIO2001
> PCIe-to-PCI Bridge. Enabling ASPM on the link leading to it allows the
> Intel SoC to enter deeper Package C-states, which is a significant power
> savings.
> [bhelgaas: commit log]
> Bugzilla:
> Link:
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
> Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> Reviewed-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
> Unfortunately it appears that this ASMedia PCIe-PCI bridge:
> 02:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1083/1085 PCIe
> to PCI Bridge [1b21:1080] (rev 04)
> doesn't cope with ASPM properly and causes a bunch of PCIe link
> errors. (This is in addition to some broken-ness known as far back as
> 2012 with these ASM1083/1085 chips with regard to PCI interrupts
> getting stuck, but this ASPM problem causes issues even if no devices
> are connected to the PCI side of the bridge, as is the case on my
> system.)
> Might need a quirk to disable ASPM on this device?

Yes I think it's a great idea to do it.

Can you please file a bug on [1] and we can continue our discussion there.