Re: [PATCH v7 2/2] PCI: Don't put non-power manageable PCIe root ports into D3
From: Rafael J. Wysocki
Date: Fri Jul 14 2023 - 15:18:12 EST
On Wed, Jul 12, 2023 at 6:09 PM Limonciello, Mario
<mario.limonciello@xxxxxxx> wrote:
>
> On 7/12/2023 07:13, Rafael J. Wysocki wrote:
> > On Wed, Jul 12, 2023 at 12:54 AM Mario Limonciello
> > <mario.limonciello@xxxxxxx> wrote:
> >>
> >> On 7/11/23 17:14, Bjorn Helgaas wrote:
> >>> [+cc Andy, Intel MID stuff]
> >>>
> >>> On Mon, Jul 10, 2023 at 07:53:25PM -0500, Mario Limonciello wrote:
> >>>> Since commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> >>>> PCIe ports from modern machines (>2015) are allowed to be put into D3 by
> >>>> storing a flag in the `struct pci_dev` structure.
> >>>
> >>> It looks like >= 2015 (not >2015). I think "a flag" refers to
> >>> "bridge_d3".
> >>
> >> Yeah.
> >>
> >>>
> >>>> pci_power_manageable() uses this flag to indicate a PCIe port can enter D3.
> >>>> pci_pm_suspend_noirq() uses the return from pci_power_manageable() to
> >>>> decide whether to try to put a device into its target state for a sleep
> >>>> cycle via pci_prepare_to_sleep().
> >>>>
> >>>> For devices that support D3, the target state is selected by this policy:
> >>>> 1. If platform_pci_power_manageable():
> >>>> Use platform_pci_choose_state()
> >>>> 2. If the device is armed for wakeup:
> >>>> Select the deepest D-state that supports a PME.
> >>>> 3. Else:
> >>>> Use D3hot.
> >>>>
> >>>> Devices are considered power manageable by the platform when they have
> >>>> one or more objects described in the table in section 7.3 of the ACPI 6.4
> >>>> specification.
> >>>
> >>> No point in citing an old version, so please cite ACPI r6.5, sec 7.3.
> >>>
> >>> The spec claims we only need one object from the table for a device to
> >>> be "power-managed", but in reality, it looks like the only things that
> >>> actually *control* power are _PRx (the _ON/_OFF methods of Power
> >>> Resources) and _PSx (ironically only mentioned parenthically).
> >>>
> >>
> >> Your point has me actually wondering if I've got this entirely wrong.
> >>
> >> Should we perhaps be looking specifically for the presence of _SxW to
> >> decide if a given PCIe port can go below D0?
> >
> > There are two things, _SxW and _SxD, and they shouldn't be confused.
> >
> > _SxW tells you what the deepest power state from which wakeup can be
> > signaled by the device (in the given Sx state of the system) is.
> >
> > _SxD tells you what the deepest power state supported by the device
> > (in the given Sx state of the system) is.
> >
> > And note that _SxW is applicable to the device itself, not the
> > subordinate devices, so I'm not sure how meaningful it is for ports.
> >
> > pci_target_state() uses both _SxW and _SxD to determine the deepest
> > state the device can go into and so long as it is used properly, it
> > shouldn't return a power state that is too deep, so I'm not really
> > sure why you want this special "should the bridge be allowed to go
> > into D3hot/cold" routine to double check this.
>
> Because pci_target_state only looks at _SxW and _SxD "if" the PCI device
> is power manageable by ACPI. That's why this change is injecting that
> extra check in.
I see. We seem to be getting to the bottom of the problem.
[cut]
> >
> > Generally speaking, pci_bridge_d3_possible() is there to prevent
> > bridges (and PCIe ports in particular) from being put into D3hot/cold
> > if there are reasons to believe that it may not work.
> > acpi_pci_bridge_d3() is part of that.
> >
> > Even if it returns 'true', the _SxD/_SxW check should still be applied
> > via pci_target_state() to determine whether or not the firmware allows
> > this particular bridge to go into D3hot/cold. So arguably, the _SxW
> > check in acpi_pci_bridge_d3() should not be necessary and if it makes
> > any functional difference, there is a bug somewhere else.
>
> But only if it was power manageable would the _SxD/_SxW check be
> applied. This issue is around the branch of pci_target_state() where
> it's not power manageable and so it uses PME or it falls back to D3hot.
Well, this looks like a spec interpretation difference.
We thought that _SxD/_SxW would only be relevant for devices with ACPI
PM support, but the firmware people seem to think that those objects
are also relevant for PCI devices that don't have ACPI PM support
(because those devices are still power-manageable via PMCSR). If
Windows agrees with that viewpoint, we'll need to adjust, but not
through adding _SxW checks in random places.