Re: [PATCH v14.a 1/1] PCI: Only put Intel PCIe ports >= 2015 into D3

From: Rafael J. Wysocki
Date: Tue Aug 22 2023 - 06:11:36 EST


On Tue, Aug 22, 2023 at 12:42 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Fri, Aug 18, 2023 at 02:39:32PM -0500, Mario Limonciello wrote:
> > commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> > changed pci_bridge_d3_possible() so that any vendor's PCIe ports
> > from modern machines (>=2015) are allowed to be put into D3.
> >
> > Iain reports that USB devices can't be used to wake a Lenovo Z13
> > from suspend. This is because the PCIe root port has been put
> > into D3 and AMD's platform can't handle USB devices waking in this
> > case.
> >
> > This behavior is only reported on Linux. Comparing the behavior
> > on Windows and Linux, Windows doesn't put the root ports into D3.
> >
> > To fix the issue without regressing existing Intel systems,
> > limit the >=2015 check to only apply to Intel PCIe ports.
> >
> > Cc: stable@xxxxxxxxxxxxxxx
> > Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> > Reported-by: Iain Lane <iain@xxxxxxxxxxxxxxxxxxx>
> > Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
> > Reviewed-by:Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
> > Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
> > ---
> > In v14 this series has been split into 3 parts.
> > part A: Immediate fix for AMD issue.
> > part B: LPS0 export improvements
> > part C: Long term solution for all vendors
> > v13->v14:
> > * Reword the comment
> > * add tag
> > v12->v13:
> > * New patch
> > ---
> > drivers/pci/pci.c | 11 ++++++++---
> > 1 file changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 60230da957e0c..bfdad2eb36d13 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3037,10 +3037,15 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
> > return false;
> >
> > /*
> > - * It should be safe to put PCIe ports from 2015 or newer
> > - * to D3.
> > + * Allow Intel PCIe ports from 2015 onward to go into D3 to
> > + * achieve additional energy conservation on some platforms.
> > + *
> > + * This is only set for Intel PCIe ports as it causes problems
> > + * on both AMD Rembrandt and Phoenix platforms where USB keyboards
> > + * can not be used to wake the system from suspend.
> > */
> > - if (dmi_get_bios_year() >= 2015)
> > + if (bridge->vendor == PCI_VENDOR_ID_INTEL &&
> > + dmi_get_bios_year() >= 2015)
> > return true;
>
> Hmm. I'm really not a fan of checks like this that aren't connected
> to an actual property of the platform. The Intel Vendor ID tells us
> nothing about what the actual problem is, which makes it really hard
> to maintain in the future. It's also very AMD- and Intel-centric,
> when this code is ostensibly arch-agnostic, so this potentially
> regresses ARM64, RISC-V, powerpc, etc.

That's a fair point.

Would it be better to reverse this and filter out AMD systems as they
are affected by the existing check?

> It's bad enough that we check for 2015. A BIOS security update to a
> 2014 platform will break things,

Well, not necessarily. Pre-2015 systems already worked and the check
was added as "surely, everything 2015 or newer should work either".
While it is true that putting PCIe Root Ports into D3hot was necessary
for extra energy conservation on Intel systems, it actually has been
expected to work everywhere.

> even though the update has nothing to
> do with D3. We're stuck with that one, and it's old enough that maybe
> it won't bite us any more, but I hate to add more.

Well, how would you like to deal with the systems that don't work
today, because they expect a different behavior?

Effectively, the current behavior for all modern systems is to allow
bridge D3 if there are no indications that it shouldn't be allowed.
The platforms in question assume the reverse, so what else can be
done?

> The list of conditions in pci_bridge_d3_possible() is a pretty good
> clue that we don't really know what we're doing, and all we can do is
> find configurations that happen to work.

Yes, because by the spec it all should work just fine. The PCI PM 1.2
specification defines the expected behavior for bridges and the PCIe
specification claims to be a superset of that.

What we need to deal with here is basically non-compliant systems and
so we have to catch the various forms of non-compliance.

> I don't have any better suggestions, other than that this should be
> described somehow via ACPI (and not in vendor-specific stuff like
> PNP0D80).

Well, it isn't in practice.