Re: [PATCH] PCI/portdrv: Skip enabling AER on external facing ports

From: Kai-Heng Feng
Date: Fri Jan 21 2022 - 07:31:50 EST


Hi Mika,

On Fri, Jan 21, 2022 at 6:55 PM Mika Westerberg
<mika.westerberg@xxxxxxxxxxxxxxx> wrote:
>
> Hi Kai-Heng,
>
> On Fri, Jan 07, 2022 at 12:09:57PM +0800, Kai-Heng Feng wrote:
> > Only from root ports of thunderbolt devices.
> >
> > The error occurs as soon as the root port is runtime suspended to D3cold.
> >
> > Runtime suspend the AER service can resolve the issue. I wonder if
> > it's the right thing to do here?
>
> I think you are right here. It seems that AER "service driver" is
> completely missing PM hooks. Probably because it is more used in server
> type of systems where power management is not priority.

Here is my previous attempt to suspend AER:
https://lore.kernel.org/linux-pci/20210127173101.446940-1-kai.heng.feng@xxxxxxxxxxxxx/

>
> > D3cold should also mean the PCI link is gone, disabling AER seems to
> > be a reasonable approach.
>
> Indeed - I think AER might trigger here because the link does "down" /
> low power state if left enabled while the root port enters D3. Something
> like below hack should disable it over low power transitions:

Ubuntu kernel has been carrying the patch for quite some time:
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/unstable/commit/?id=e82f15f1a26273b004054a81ef45937fb1b632e5

>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 9fa1f97e5b27..64138cf82db8 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1432,6 +1432,22 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
> return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
> }
>
> +static int aer_suspend(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> +
> + aer_disable_rootport(rpc);
> + return 0;
> +}
> +
> +static int aer_resume(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> +
> + aer_enable_rootport(rpc);
> + return 0;
> +}
> +
> static struct pcie_port_service_driver aerdriver = {
> .name = "aer",
> .port_type = PCIE_ANY_PORT,
> @@ -1439,6 +1455,10 @@ static struct pcie_port_service_driver aerdriver = {
>
> .probe = aer_probe,
> .remove = aer_remove,
> + .suspend = aer_suspend,
> + .resume = aer_resume,
> + .runtime_suspend = aer_suspend,
> + .runtime_resume = aer_resume,
> };

This patch is exactly what I tested.

Maybe only suspend/runtime_suspend AER when the target PM state is D3cold?
PCIe spec doesn't say how to handle AER in Link L2/L3Ready/L3, but I
think it's reasonable to suspend AER when power is loss.

Let me come up with a patch with that idea.

Kai-Heng

>
> /**