Re: [PATCH v7 2/2] PCI: Don't put non-power manageable PCIe root ports into D3

From: Rafael J. Wysocki
Date: Wed Jul 12 2023 - 08:13:33 EST


On Wed, Jul 12, 2023 at 12:54 AM Mario Limonciello
<mario.limonciello@xxxxxxx> wrote:
>
> On 7/11/23 17:14, Bjorn Helgaas wrote:
> > [+cc Andy, Intel MID stuff]
> >
> > On Mon, Jul 10, 2023 at 07:53:25PM -0500, Mario Limonciello wrote:
> >> Since commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> >> PCIe ports from modern machines (>2015) are allowed to be put into D3 by
> >> storing a flag in the `struct pci_dev` structure.
> >
> > It looks like >= 2015 (not >2015). I think "a flag" refers to
> > "bridge_d3".
>
> Yeah.
>
> >
> >> pci_power_manageable() uses this flag to indicate a PCIe port can enter D3.
> >> pci_pm_suspend_noirq() uses the return from pci_power_manageable() to
> >> decide whether to try to put a device into its target state for a sleep
> >> cycle via pci_prepare_to_sleep().
> >>
> >> For devices that support D3, the target state is selected by this policy:
> >> 1. If platform_pci_power_manageable():
> >> Use platform_pci_choose_state()
> >> 2. If the device is armed for wakeup:
> >> Select the deepest D-state that supports a PME.
> >> 3. Else:
> >> Use D3hot.
> >>
> >> Devices are considered power manageable by the platform when they have
> >> one or more objects described in the table in section 7.3 of the ACPI 6.4
> >> specification.
> >
> > No point in citing an old version, so please cite ACPI r6.5, sec 7.3.
> >
> > The spec claims we only need one object from the table for a device to
> > be "power-managed", but in reality, it looks like the only things that
> > actually *control* power are _PRx (the _ON/_OFF methods of Power
> > Resources) and _PSx (ironically only mentioned parenthically).
> >
>
> Your point has me actually wondering if I've got this entirely wrong.
>
> Should we perhaps be looking specifically for the presence of _SxW to
> decide if a given PCIe port can go below D0?

There are two things, _SxW and _SxD, and they shouldn't be confused.

_SxW tells you what the deepest power state from which wakeup can be
signaled by the device (in the given Sx state of the system) is.

_SxD tells you what the deepest power state supported by the device
(in the given Sx state of the system) is.

And note that _SxW is applicable to the device itself, not the
subordinate devices, so I'm not sure how meaningful it is for ports.

pci_target_state() uses both _SxW and _SxD to determine the deepest
state the device can go into and so long as it is used properly, it
shouldn't return a power state that is too deep, so I'm not really
sure why you want this special "should the bridge be allowed to go
into D3hot/cold" routine to double check this.

> IE very similar to what 8133844a8f24 did but for ports that are not
> hotplug bridges.
>
> > This matches up well with acpi_pci_power_manageable(), which returns
> > true if a device has either _PR0 or _PS0.
> >
> > Per ACPI r6.5, sec 7.3, ACPI control of device power states uses
> > Power Resources (i.e., the _ON/_OFF methods of _PRx) or _PSx
> > methods. Hence acpi_pci_power_manageable() checks for the presence
> > of _PR0 or _PS0.
> >
> > Tangent unrelated to *this* patch: I don't know how to think about the
> > pci_use_mid_pm() in platform_pci_power_manageable() because I haven't
> > seen a MID spec. pci_use_mid_pm() isn't dependent on "dev", so we
> > claim *all* PCI devices, even external ones, are power manageable by
> > the platform, which doesn't seem right.
> >
> >> At suspend Linux puts PCIe root ports that are not power manageable by
> >> the platform into D3hot. Windows only puts PCIe root ports into D3 when
> >> they are power manageable by the platform.
> >>
> >> The policy selected for Linux to put non-power manageable PCIe root ports
> >> into D3hot at system suspend doesn't match anything in the PCIe or ACPI
> >> specs.
> >>
> >> Linux shouldn't assume PCIe root ports support D3 just because
> >> they're on a machine newer than 2015, the ports should also be considered
> >> power manageable by the platform.
> >>
> >> Add an extra check for PCIe root ports to ensure D3 isn't selected for
> >> them if they are not power-manageable through platform firmware.
> >> This will avoid pci_pm_suspend_noirq() changing the power state
> >> via pci_prepare_to_sleep().
> >>
> >> The check is focused on PCIe root ports because they are part of
> >> the platform. Other PCIe bridges may be connected externally and thus
> >> cannot impose platform specific limitations.
> >>
> >> Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/07_Power_and_Performance_Mgmt/device-power-management-objects.html [1]
>
> At least for myself when I respin this, here is the 6.5 link.
> https://uefi.org/specs/ACPI/6.5/07_Power_and_Performance_Mgmt.html#device-power-management-objects
>
> >> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> >> Reported-by: Iain Lane <iain@xxxxxxxxxxxxxxxxxxx>
> >> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
> >> Acked-by: Rafael J. Wysocki <rafael@xxxxxxxxxx>
> >> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
> >> Reviewed-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
> >> Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
> >> ---
> >> v6->v7:
> >> * revert back to v5 code, rewrite commit message to specific examples
> >> and be more generic
> >> ---
> >> drivers/pci/pci.c | 8 ++++++++
> >> 1 file changed, 8 insertions(+)
> >>
> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >> index f916fd76eba79..4be8c6f8f4ebe 100644
> >> --- a/drivers/pci/pci.c
> >> +++ b/drivers/pci/pci.c
> >> @@ -3041,6 +3041,14 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
> >> if (dmi_check_system(bridge_d3_blacklist))
> >> return false;
> >>
> >> + /*
> >> + * It's not safe to put root ports that aren't power manageable
> >> + * by the platform into D3.
> >
> > Does this refer specifically to D3cold?
> >
>
> No, it's intentionally not saying D3hot or D3cold because it's stored to
> "bridge_d3" which is used for both D3hot and D3cold.
>
> > I assume that if we were talking about D3hot, we wouldn't need to
> > check for ACPI support because D3hot behavior should be fully covered
> > by the PCIe spec.
>
> Right; the PCIe spec indicates that D3hot should be supported by all
> devices and has rules about when you can go into D3hot like not allowing
> it unless child devices are already in D3.
>
> Naïvely you would think that means pci_bridge_d3_possible() shouldn't
> have any of these checks, but they've all obviously all been grown for a
> reason.
>
> The value from pci_bridge_d3_possible() is used both "at suspend" and
> "runtime", but what we're really talking with this issue is is the
> behavior "at suspend".

Which is suspend-to-idle AFAICS, so from the ACPI perspective it is
all S0 anyway.

> >
> > Let's be specific about D3hot vs D3cold whenever possible.
> >
> >> + if (pci_pcie_type(bridge) == PCI_EXP_TYPE_ROOT_PORT &&
> >> + !platform_pci_power_manageable(bridge))
> >> + return false;
> >
> > If ACPI says a device is not power-manageable, i.e., ACPI doesn't know
> > how to put it in D0, it makes sense to return "false" here so we don't
> > try to put it in D3cold.
> >
> > But I don't understand the ROOT_PORT check. We may have a Switch
> > described via ACPI, and the ROOT_PORT check means we can return "true"
> > (it's OK to use D3cold) even if the Switch Port is not power-manageable
> > via ACPI.
>
> This feels a lot more of a potential to cause regressions.
>
> Something we could do is include the value for bridge->untrusted in the
> decision, but I'm not convinced that's correct.
>
> Another option can be to merge a series of changes like this:
>
> 1) My v5 patch that adds the quirks for the two known broken machines
> 2) Patch 1/2 from v7
> 3) Patch 2/2 from v7
> 4) Another patch to drop the root port check here
>
> #1 could go to 6.5-rcX as it's riskless. #2-4 could go to linux-next
> and if they work out not to cause any problems we could revert #1.
>
> If they cause problems we come back to the drawing table to find the
> right balance.

Generally speaking, pci_bridge_d3_possible() is there to prevent
bridges (and PCIe ports in particular) from being put into D3hot/cold
if there are reasons to believe that it may not work.
acpi_pci_bridge_d3() is part of that.

Even if it returns 'true', the _SxD/_SxW check should still be applied
via pci_target_state() to determine whether or not the firmware allows
this particular bridge to go into D3hot/cold. So arguably, the _SxW
check in acpi_pci_bridge_d3() should not be necessary and if it makes
any functional difference, there is a bug somewhere else.