Re: [RESEND PATCH v3 2/2] PCI/PTM: fix to maintain pci_dev->ptm_enabled
From: Rafael J. Wysocki
Date: Wed Aug 31 2022 - 13:53:56 EST
On Tue, Aug 30, 2022 at 9:46 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Tue, Aug 30, 2022 at 08:03:41PM +0200, Rafael J. Wysocki wrote:
> > On Tue, Aug 30, 2022 at 7:37 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > On Tue, Aug 30, 2022 at 06:58:20PM +0200, Rafael J. Wysocki wrote:
> > > > On Tue, Aug 30, 2022 at 6:25 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > > > On Tue, Aug 30, 2022 at 03:49:13AM -0700, Rajvi Jingar wrote:
> > > > > > pci_dev->ptm_enabled needs to be maintained to reflect the current PTM
> > > > > > state of the device. In pci_ptm_disable(), clear ptm_enabled from
> > > > > > 'struct pci_dev' on disabling PTM state for the device.
> > > > > > In pci_restore_ptm_state(), set dev->ptm_enabled based on the restored
> > > > > > PTM state of the device.
> > > > > >
> > > > > > In pci_ptm_disable(), perform ptm_enabled check to avoid config space
> > > > > > access in case if PTM is already disabled for the device. ptm_enabled
> > > > > > won't be set for non-PCIe devices so pci_is_pcie(dev) check is not
> > > > > > needed anymore.
> > > > >
> > > > > This one sounds like it's supposed to fix something, but I'm not clear
> > > > > exactly what.
> > > > >
> > > > > I have a vague memory of config accesses messing up a low power state.
> > > > > But this is still completely magical and unmaintainable since AFAIK
> > > > > there is nothing in the PCIe spec about avoiding config accesses when
> > > > > PTM is disabled.
> > >
> > > I'm remembering this, which seemed like an ancestor of this patch:
> > > https://lore.kernel.org/r/CAJZ5v0gNy6YJA+RNTEyHBdoJK-jqKN60oU_k_LX4=cTuyoO2mg@xxxxxxxxxxxxxx
> > >
> > > This patch is queued up and does something similar (disabling PTM on
> > > all devices before suspend):
> > > https://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?id=d878400c7d98
> > >
> > > Is d878400c7d98 enough to solve the functional issue, and this patch
> > > is basically a cleanup? I think it's a nice cleanup and worth doing.
> >
> > This patch is independent of d878400c7d98. We've been working on a
> > d878400c7d98 counterpart on top of this patch.
> >
> > There is a problem with d878400c7d98 that disabling PTM from
> > pci_prepare_to_sleep() is not enough, because that function is not
> > called for some endpoints where we also want to disable PTM on system
> > suspend.
> >
> > IMV the most suitable place to disable PTM (temporarily) on
> > system-wide suspend is in pci_pm_suspend_noirq(, because it is the
> > last piece of generic PCI code running for all PCI devices regardless
> > of what their drivers do.
> >
> > > I'm just trying to figure out the "avoid config space access" in the
> > > commit log. If avoiding config space access is necessary, it needs
> > > more explanation.
>
> AFAICT, these are great cleanups but neither of these patches is
> really a functional change. If that's true,
It is true.
> the "avoid config space access" should be removed from the commit log.
Agreed.
> I dropped d878400c7d98 for now and updated my pci/pm branch with just
> these two patches with updates as attached below.
They look good to me.
> Then we can rework d878400c7d98 so it disables PTM unconditionally in
> pci_pm_suspend_noirq() instead of in pci_prepare_to_sleep().
In the meantime, I recalled that nvme wanted to leave the device in D0
and program it into an internal low-power state in some cases which
would be disturbed by disabling PTM later on (a config space write
would kick the device out of the internal low-power state).
So it looks like it would be better to disable PTM as the first thing
in pci_pm_suspend() before calling the driver's suspend callback
(which may be nvme suspend), but then we'd need to save the original
PTM status and restore it during the subsequent resume. That could be
done as early as in pci_pm_resume_noirq(), but I think the cleanest
way would be to add a new bit to struct pci_device for that.
Alternatively, we can drop the $subject patch, so ptm_enabled still
only means that it has been enabled during enumeration and it can be
used to restore the original PTM status during resume.
> Currently, or with d878400c7d98, we only call pci_prepare_to_sleep()
> when (!skip_bus_pm && !state_saved && pci_power_manageable()), so if a
> driver saves its own state, we won't disable PTM, which seems like a
> problem.
Agreed.
> I assume we also want to move the pci_disable_ptm() from
> pci_finish_runtime_suspend() to pci_pm_runtime_suspend() to keep it
> parallel with pci_pm_suspend_noirq().
I think so.
> Is this making sense?
Yes, it is, modulo the above observations.
> > > > Because ptm_enabled is expected to always reflect the hardware state,
> > > > pci_disable_ptm() needs to be amended to clear it. Also it is prudent
> > > > to explicitly make it reflect the new hardware state in
> > > > pci_restore_ptm_state().
> > > >
> > > > Then, pci_disable_ptm() can be made bail out if ptm_enabled is clear,
> > > > because it has nothing to do then and the pci_is_pcie() check in there
> > > > is not necessary, because ptm_enabled will never be set for devices
> > > > that are not PCIe.
> > > >
> > > > > At the very least, we would need more details in the commit log and
> > > > > a hint in the code about this.
>
> commit 22b07d9ddd02 ("PCI/PTM: Update pdev->ptm_enabled to track hardware state")
> Author: Rajvi Jingar <rajvi.jingar@xxxxxxxxxxxxxxx>
> Date: Tue Aug 30 03:49:13 2022 -0700
>
> PCI/PTM: Update pdev->ptm_enabled to track hardware state
>
> Update pdev->ptm_enabled to track hardware state when we disable or restore
> PTM state.
>
> No functional change intended, since 'ptm_enabled' was previously only
> tested during enumeration and pci_disable_ptm() is only used during
> suspend.
>
> [bhelgaas: commit log]
> Link: https://lore.kernel.org/r/20220830104913.1620539-2-rajvi.jingar@xxxxxxxxxxxxxxx
> Signed-off-by: Rajvi Jingar <rajvi.jingar@xxxxxxxxxxxxxxx>
> Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>
> diff --git a/drivers/pci/pcie/ptm.c b/drivers/pci/pcie/ptm.c
> index 368a254e3124..1ce241d4538f 100644
> --- a/drivers/pci/pcie/ptm.c
> +++ b/drivers/pci/pcie/ptm.c
> @@ -34,7 +34,7 @@ void pci_disable_ptm(struct pci_dev *dev)
> int ptm;
> u16 ctrl;
>
> - if (!pci_is_pcie(dev))
> + if (!dev->ptm_enabled)
> return;
>
> ptm = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_PTM);
> @@ -44,6 +44,7 @@ void pci_disable_ptm(struct pci_dev *dev)
> pci_read_config_word(dev, ptm + PCI_PTM_CTRL, &ctrl);
> ctrl &= ~(PCI_PTM_CTRL_ENABLE | PCI_PTM_CTRL_ROOT);
> pci_write_config_word(dev, ptm + PCI_PTM_CTRL, ctrl);
> + dev->ptm_enabled = 0;
> }
>
> void pci_save_ptm_state(struct pci_dev *dev)
> @@ -83,6 +84,7 @@ void pci_restore_ptm_state(struct pci_dev *dev)
>
> cap = (u16 *)&save_state->cap.data[0];
> pci_write_config_word(dev, ptm + PCI_PTM_CTRL, *cap);
> + dev->ptm_enabled = !!(*cap & PCI_PTM_CTRL_ENABLE);
> }
>
> void pci_ptm_init(struct pci_dev *dev)
>
> commit 6e594f65471b ("PCI/PM: Simplify pci_pm_suspend_noirq()")
> Author: Rajvi Jingar <rajvi.jingar@xxxxxxxxxxxxxxx>
> Date: Tue Aug 30 03:49:12 2022 -0700
>
> PCI/PM: Simplify pci_pm_suspend_noirq()
>
> We always want to save the device state unless the driver has already done
> it. Rearrange the checking in pci_pm_suspend_noirq() to make this more
> clear. No functional change intended.
>
> [bhelgaas: commit log, rewrap comment]
> Link: https://lore.kernel.org/r/20220830104913.1620539-1-rajvi.jingar@xxxxxxxxxxxxxxx
> Signed-off-by: Rajvi Jingar <rajvi.jingar@xxxxxxxxxxxxxxx>
> Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 49238ddd39ee..2815922ac525 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -867,20 +867,15 @@ static int pci_pm_suspend_noirq(struct device *dev)
> }
> }
>
> - if (pci_dev->skip_bus_pm) {
> + if (!pci_dev->state_saved) {
> + pci_save_state(pci_dev);
> +
> /*
> - * Either the device is a bridge with a child in D0 below it, or
> - * the function is running for the second time in a row without
> - * going through full resume, which is possible only during
> - * suspend-to-idle in a spurious wakeup case. The device should
> - * be in D0 at this point, but if it is a bridge, it may be
> - * necessary to save its state.
> + * If the device is a bridge with a child in D0 below it,
> + * it needs to stay in D0, so check skip_bus_pm to avoid
> + * putting it into a low-power state in that case.
> */
> - if (!pci_dev->state_saved)
> - pci_save_state(pci_dev);
> - } else if (!pci_dev->state_saved) {
> - pci_save_state(pci_dev);
> - if (pci_power_manageable(pci_dev))
> + if (!pci_dev->skip_bus_pm && pci_power_manageable(pci_dev))
> pci_prepare_to_sleep(pci_dev);
> }
>