[Regression in 6.14-rc1] System suspend/resume broken by PCI commit 1db806ec06b7c

From: Rafael J. Wysocki
Date: Mon Feb 03 2025 - 15:13:01 EST


Hi,

The following commit:

commit 1db806ec06b7c6e08e8af57088da067963ddf117
Author: Jian-Hong Pan <jhp@xxxxxxxxxxxxx>
Date: Fri Nov 15 15:22:02 2024 +0800

PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()

After 17423360a27a ("PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume"), pci_save_aspm_l1ss_state(dev) saves the L1SS state for
"dev", and pci_restore_aspm_l1ss_state(dev) restores the state for both
"dev" and its parent.

The problem is that unless pci_save_state() has been used in some other
path and has already saved the parent L1SS state, we will restore junk to
the parent, which means the L1 Substates likely won't work correctly.

Save the L1SS config for both the device and its parent in
pci_save_aspm_l1ss_state(). When restoring, we need both because L1SS must
be enabled at the parent (the Downstream Port) before being enabled at the
child (the Upstream Port).

Link: https://lore.kernel.org/r/20241115072200.37509-3-jhp@xxxxxxxxxxxxx
Fixes: 17423360a27a ("PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218394
Suggested-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
Signed-off-by: Jian-Hong Pan <jhp@xxxxxxxxxxxxx>
[bhelgaas: parallel save/restore structure, simplify commit log, patch at
https://lore.kernel.org/r/20241212230340.GA3267194@bhelgaas]
Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
Tested-by: Jian-Hong Pan <jhp@xxxxxxxxxxxxx> # Asus B1400CEAE

broke system suspend/resume on my Dell XPS13 9360. It doesn't even
pass suspend/resume testing after "echo devices > /sys/power/pm_test".

It looks like PCIe links are all down during resume after the above
commit, but it is rather hard to collect any data in that state.

Reverting the above commit on top of 6.14-rc1 makes things work again,
no problem.

I'm unsure what exactly the problem is ATM, but I'm going to check a
couple of theories.

Cheers, Rafael