Re: [PATCH] PCI: pciehp: Fix system hang on resume after hot-unplug during suspend

From: Lukas Wunner
Date: Tue Oct 01 2024 - 07:04:26 EST


On Mon, Sep 30, 2024 at 09:31:53AM +0800, AceLan Kao wrote:
> Lukas Wunner <lukas@xxxxxxxxx> 2024 9 28 8:51:
> > - if (pci_get_dsn(pdev) != ctrl->dsn)
> > + dsn = pci_get_dsn(pdev);
> > + if (!PCI_POSSIBLE_ERROR(dsn) &&
> > + dsn != ctrl->dsn)
> > return true;
>
> In my case, the pciehp_device_replaced() returns true from this final check.
> And these are the values I got
> dsn = 0x00000000, ctrl->dsn = 0x7800AA00
> dsn = 0x00000000, ctrl->dsn = 0x21B7D000

Ah because pci_get_dsn() returns 0 if the device is gone.
Below is a modified patch which returns false in that case.

I've only changed:
- dsn = pci_get_dsn(pdev);
- if (!PCI_POSSIBLE_ERROR(dsn) &&
+ if ((dsn = pci_get_dsn(pdev)) &&
+ !PCI_POSSIBLE_ERROR(dsn) &&


> Did some other test
> TBT HDD -> TBT dock -> laptop
> suspend
> TBT HDD -> laptop(replace TBT dock with the TBT HDD)
> resume
> Got the same result as above, looks like it didn't detect the TBT dock
> has been replaced by TBT HDD.
>
> In the origin call trace, unplug TBT dock or replace it with TBT HDD,
> it returns true by the below check
> if (pci_read_config_dword(pdev, PCI_VENDOR_ID, &reg) ||
> reg != (pdev->vendor | (pdev->device << 16)) ||
> pci_read_config_dword(pdev, PCI_CLASS_REVISION, &reg) ||
> reg != (pdev->revision | (pdev->class << 8)))
> return true;

Hm, that's odd. Why is that? Is reg == 0xffffffff in one of those cases?

I guess that could happen if the Thunderbolt tunnels are not yet
established at that point (i.e. in the ->resume_noirq phase),
but normally they should be. Does this system use ICM-controlled
tunnel management or kernel-native (software-controlled) tunnel
management?

Thanks,

Lukas