Re: [PATCH 5.4 182/389] PCI/portdrv: Dont disable AER reporting in get_port_device_capability()
From: Bjorn Helgaas
Date: Fri Mar 31 2023 - 18:06:40 EST
[+cc iwlwifi folks]
Re: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in
get_port_device_capability()")
On Wed, Mar 29, 2023 at 04:17:29PM -0700, Ben Greear wrote:
> On 8/30/22 3:16 PM, Ben Greear wrote:
> ...
> I notice this patch appears to be in 6.2.6 kernel, and my kernel logs are
> full of spam and system is unstable. Possibly the unstable part is related
> to something else, but the log spam is definitely extreme.
>
> These systems are fairly stable on 5.19-ish kernels without the patch in
> question.
Hmmm, I was going to thank you for the report, but looking closer, I
see that you reported this last August [1] and we *should* have
pursued it with the iwlwifi folks or figured out what the PCI core is
doing wrong, but I totally dropped the ball. Sorry about that.
To make sure we're all on the same page, we're talking about
8795e182b02d ("PCI/portdrv: Don't disable AER reporting in
get_port_device_capability()") [2],
which is present in v6.0 and later [3] but not v5.19.16 [4].
> Here is sample of the spam:
>
> [ 1675.547023] pcieport 0000:03:02.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> [ 1675.556851] pcieport 0000:03:02.0: device [10b5:8619] error status/mask=00100000/00000000
> [ 1675.563904] pcieport 0000:03:02.0: [20] UnsupReq (First)
> [ 1675.569398] pcieport 0000:03:02.0: AER: TLP Header: 34000000 05001f10 00000000 88c888c8
> [ 1675.576296] iwlwifi 0000:05:00.0: AER: can't recover (no error_detected callback)
The TLP header says this is an LTR message from 05:00.0. Apparently
the bridge above 05:00.0 is 03:02.0, which logged an Unsupported
Request error for the message, probably because 03:02.0 doesn't have
LTR enabled.
Can you collect the output of "sudo lspci -vv"? Does this happen even
before loading the iwlwifi driver? I assume there are no hotplug
events before this happens?
The PCI core enables LTR during enumeration for every device for which
LTR is supported and enabled along the entire path up to a Root Port.
If it does that wrong, you might see errors even before loading
iwlwifi.
I see that iwlwifi *reads* PCI_EXP_DEVCTL2_LTR_EN in
iwl_pcie_apm_config(), which should be safe. I don't see any writes,
but the iwlwifi experts should know more about this. There are a
couple paths that do this, which looks somehow related:
__iwl_mvm_mac_start
iwl_mvm_up
iwl_mvm_config_ltr
if (trans->ltr_enabled)
iwl_mvm_send_cmd_pdu(mvm, LTR_CONFIG, ...)
Bjorn
[1] https://lore.kernel.org/all/47b775c5-57fa-5edf-b59e-8a9041ffbee7@xxxxxxxxxxxxxxx/#t
[2] https://git.kernel.org/linus/8795e182b02d
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/portdrv_core.c?id=v6.0#n223
[4] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/pci/pcie/portdrv_core.c?id=v5.19.16#n223