Re: 5.7 regression: Lots of PCIe AER errors and suspend failure without pcie=noaer
From: Kai-Heng Feng
Date:  Fri Jul 24 2020 - 10:32:14 EST
Hi Robert,
> On Jul 22, 2020, at 07:55, Robert Hancock <hancockrwd@xxxxxxxxx> wrote:
> 
> On Fri, Jul 10, 2020 at 6:28 PM Robert Hancock <hancockrwd@xxxxxxxxx> wrote:
>> 
>> On Fri, Jul 10, 2020 at 6:23 PM Robert Hancock <hancockrwd@xxxxxxxxx> wrote:
>>> 
>>> Noticed a problem on my desktop with an Asus PRIME H270-PRO
>>> motherboard after Fedora 32 upgraded to the 5.7 kernel (now on 5.7.8):
>>> periodically there are PCIe AER errors getting spewed in dmesg that
>>> weren't happening before, and this also seems to causes suspend to
>>> fail - the system just wakes back up again right away, I am assuming
>>> due to some AER errors interrupting the process. 5.6 kernels didn't
>>> have this problem. Setting "pcie=noaer" on the kernel command line
>>> works around the issue, but I'm not sure what would have changed to
>>> trigger this to occur?
>> 
>> Correction: the workaround option is "pci=noaer".
> 
> As a follow-up, from some more experimentation, it appears that
> disabling PCIe ASPM with setpci on both the ASMedia PCIe-PCI bridge as
> well as the PCIe root port it is connected to seems to silence the AER
> errors and allow suspend/resume to work again:
> 
> setpci -s 00:1c.0 0x50.B=0x00
> setpci -s 02:00.0 0x90.B=0x00
> 
> It appears the behavior changed as a result of this patch (which went
> into the stable tree for 5.7.6 and so affects 5.7 kernels as well):
> 
> commit 66ff14e59e8a30690755b08bc3042359703fb07a
> Author: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
> Date:   Wed May 6 01:34:21 2020 +0800
> 
>    PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges
> 
>    7d715a6c1ae5 ("PCI: add PCI Express ASPM support") added the ability for
>    Linux to enable ASPM, but for some undocumented reason, it didn't enable
>    ASPM on links where the downstream component is a PCIe-to-PCI/PCI-X Bridge.
> 
>    Remove this exclusion so we can enable ASPM on these links.
> 
>    The Dell OptiPlex 7080 mentioned in the bugzilla has a TI XIO2001
>    PCIe-to-PCI Bridge.  Enabling ASPM on the link leading to it allows the
>    Intel SoC to enter deeper Package C-states, which is a significant power
>    savings.
> 
>    [bhelgaas: commit log]
>    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207571
>    Link: https://lore.kernel.org/r/20200505173423.26968-1-kai.heng.feng@xxxxxxxxxxxxx
>    Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
>    Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
>    Reviewed-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
> 
> Unfortunately it appears that this ASMedia PCIe-PCI bridge:
> 
> 02:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1083/1085 PCIe
> to PCI Bridge [1b21:1080] (rev 04)
> 
> doesn't cope with ASPM properly and causes a bunch of PCIe link
> errors. (This is in addition to some broken-ness known as far back as
> 2012 with these ASM1083/1085 chips with regard to PCI interrupts
> getting stuck, but this ASPM problem causes issues even if no devices
> are connected to the PCI side of the bridge, as is the case on my
> system.)
> 
> Might need a quirk to disable ASPM on this device?
Yes I think it's a great idea to do it.
Can you please file a bug on [1] and we can continue our discussion there.
[1] https://bugzilla.kernel.org
Kai-Heng