Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM

From: Bjorn Helgaas
Date: Thu May 12 2022 - 13:42:49 EST


Hi Rajvi,

I received your v1, v2, v3, v4, v5 postings because they were sent
directly to bhelgaas@xxxxxxxxxx, but for some reason vger doesn't like
them so they don't show up on the mailing list:

https://lore.kernel.org/all/?q=a%3Arajvi.jingar

I looked at the ones I received directly and don't see an obvious
problem. Maybe there's a hint here?

http://vger.kernel.org/majordomo-info.html

All patches should appear on the linux-pci mailing list before
applying them, so we need to figure this out somehow. In fact, I read
and review patches from linux-pci, so I often don't even see things
that are just sent directly to bhelgaas@xxxxxxxxxx.

On Thu, May 12, 2022 at 03:49:18PM +0200, Rafael J. Wysocki wrote:
> On 4/29/2022 11:05 PM, Rajvi Jingar wrote:
> > For the PCIe devices (like nvme) that do not go into D3 state still need to
> > disable PTM to allow the port to enter a lower-power PM state and the SoC
> > to reach a lower-power idle state as a whole. Move the pci_disable_ptm()
> > out of pci_prepare_to_sleep() as this code path is not followed for devices
> > that do not go into D3. This fixes the issue seen on Dell XPS 9300 with
> > Ice Lake CPU and Dell Precision 5530 with Coffee Lake CPU platforms to get
> > improved residency in low power idle states.

I think the paragraph above is a distraction, and the real reason is
the paragraph below.

> > Also, on receiving a PTM Request from a downstream device, if PTM is
> > disabled on the root port, as per PCIe r6.0, sec 6.21.3, such a request
> > would cause an Unsupported Request error. So it must first disable PTM in
> > any downstream devices.
> >
> > Fixes: a697f072f5da ("PCI: Disable PTM during suspend to save power")
> > Signed-off-by: Rajvi Jingar <rajvi.jingar@xxxxxxxxx>
> > Suggested-by: David E. Box <david.e.box@xxxxxxxxxxxxxxx>
> > ---
> > v1 -> v2: add Fixes tag in commit message
> > v2 -> v3: move changelog after "---" marker
> > v3 -> v4: add "---" marker after changelog
> > v4 -> v5: move pci_disable_ptm() out of the pci_dev->state_saved check.
> > disable PTM for all devices, not just root ports.
> > ---
> > drivers/pci/pci-driver.c | 28 +++++++++++++++++++---------
> > drivers/pci/pci.c | 10 ----------
> > 2 files changed, 19 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index 8b55a90126a2..400dd18a9cf5 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -813,6 +813,7 @@ static int pci_pm_suspend_late(struct device *dev)
> > static int pci_pm_suspend_noirq(struct device *dev)
> > {
> > + unsigned int dev_state_saved;
> > struct pci_dev *pci_dev = to_pci_dev(dev);
> > const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> > @@ -845,16 +846,25 @@ static int pci_pm_suspend_noirq(struct device *dev)
> > }
> > }
> > - if (!pci_dev->state_saved) {
> > + dev_state_saved = pci_dev->state_saved;
>
> If pci_dev->state_saved is set here, the device may be in D3cold already and
> disabling PTM for it will not work.  Of course, it is not necessary to
> disable PTM for it then, but this case need to be taken care of.
>
> > + if (!dev_state_saved)
> > pci_save_state(pci_dev);
> > - /*
> > - * If the device is a bridge with a child in D0 below it, it needs to
> > - * stay in D0, so check skip_bus_pm to avoid putting it into a
> > - * low-power state in that case.
> > - */
> > - if (!pci_dev->skip_bus_pm && pci_power_manageable(pci_dev))
> > - pci_prepare_to_sleep(pci_dev);
> > - }
> > +
> > + /*
> > + * There are systems (for example, Intel mobile chips since Coffee
> > + * Lake) where the power drawn while suspended can be significantly
> > + * reduced by disabling PTM as this allows the SoC to reach a
> > + * lower-power idle state as a whole.

I think the argument for disabling PTM is that:

- If a PTM Requester is put in a low-power state, a PTM Responder
upstream from it may also be put in a low-power state.

- Putting a Port in D1, D2, or D3hot does not prohibit it from
sending or responding to PTM Requests (I'd be glad to be corrected
about this).

- We want to disable PTM on Responders when they are in a low-power
state.

- Per 6.21.3, a PTM Requester must not be enabled when the upstream
PTM Responder is disabled.

- Therefore, we must disable all PTM on all downstream PTM
Requesters before disabling it on the PTM Responder, e.g., a Root
Port.

This has nothing specifically to do with Coffee Lake or other Intel
chips, so I think the comment should be merely something to the
effect that "disabling PTM reduces power consumption."

> Something like this should suffice IMV:
>
> if (!dev_state_saved || pci_dev->current_state != PCI_D3cold)
>
>         pci_disable_ptm(pci_dev);

It makes sense to me that we needn't disable PTM if the device is in
D3cold. But the "!dev_state_saved" condition depends on what the
driver did. Why is that important? Why should we not do the
following?

if (pci_dev->current_state != PCI_D3cold)
pci_disable_ptm(pci_dev);

Bjorn