Re: [PATCH 1/1] PCI/ASPM: Fix L1SS saving
From: Ilpo Järvinen
Date: Wed Feb 05 2025 - 03:38:44 EST
On Tue, 4 Feb 2025, Bjorn Helgaas wrote:
> [+cc Rafael]
>
> On Fri, Jan 31, 2025 at 05:29:13PM +0200, Ilpo Järvinen wrote:
> > The commit 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in
> > pci_save_aspm_l1ss_state()") aimed to perform L1SS config save for both
> > the Upstream Port and its upstream bridge when handling an Upstream
> > Port, which matches what the L1SS restore side does. However,
> > parent->state_saved can be set true at an earlier time when the
> > upstream bridge saved other parts of its state.
>
> So I guess the scenario is that we got here because some driver called
> pci_save_state(pdev):
>
> pci_save_state
> dev->state_saved = true <--
> pci_save_pcie_state
> pci_save_aspm_l1ss_state
> if (pcie_downstream_port(pdev))
> return
> # save pdev L1SS state here
> if (parent->state_saved) <--
> return
> # save parent L1SS state here
>
> and the problem is that we previously called pci_save_state(parent),
> which set "parent->state_saved = true" but did not save its L1SS state
> because pci_save_aspm_l1ss_state() is a no-op for Downstream Ports,
> right?
Yes! An unfortunate interaction between those two checks.
> But I would think this would be a very common situation because
> pcie_portdrv_probe() calls pci_save_state() for Downstream Ports when
> pciehp, AER, PME, etc, are enabled, and this would happen before the
> pci_save_state() calls from Endpoint drivers.
>
> So I'm a little surprised that this didn't blow up for everybody
> immediately. Is there something that makes this unusual?
I agree it should be very common and was quite surprised that -next
did not catch it. What I recall though is you modified the patch while
applying it by adding those Downstream Port checks so the fix
patch's Tested-by was for different code from what got applied (and it
would have been caught would the original author have tested also the
modified commit).
Unfortunately, I cannot think of anything that would be so unusual to
warrant not detecting it earlier. Maybe it was just the holiday period
causing less testing and lower level of awareness in general? The machine
doesn't always hang because of the problem as was the case with Niklāvs,
so it might have occurred but went unnoticed if it occurred for a device
that is not critical.
> > Then later when
> > attempting to save the L1SS config while handling the Upstream Port,
> > parent->state_saved is true in pci_save_aspm_l1ss_state() resulting in
> > early return and skipping saving bridge's L1SS config because it is
> > assumed to be already saved. Later on restore, junk is written into
> > L1SS config which causes issues with some devices.
> >
> > Remove parent->state_saved check and unconditionally save L1SS config
> > also for the upstream bridge from an Upstream Port which ought to be
> > harmless from correctness point of view. With the Upstream Port check
> > now present, saving the L1SS config more than once for the bridge is no
> > longer a problem (unlike when the parent->state_saved check got
> > introduced into the fix during its development).
> >
> > Fixes: 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()")
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219731
> > Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@xxxxxxxxx>
> > Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@xxxxxxxxx>
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
> > ---
> > drivers/pci/pcie/aspm.c | 3 ---
> > 1 file changed, 3 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index e0bc90597dca..da3e7edcf49d 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -108,9 +108,6 @@ void pci_save_aspm_l1ss_state(struct pci_dev *pdev)
> > pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL2, cap++);
> > pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL1, cap++);
> >
> > - if (parent->state_saved)
> > - return;
> > -
> > /*
> > * Save parent's L1 substate configuration so we have it for
> > * pci_restore_aspm_l1ss_state(pdev) to restore.
> >
> > base-commit: 72deda0abee6e705ae71a93f69f55e33be5bca5c
> > --
> > 2.39.5
> >
>
--
i.