Re: [PATCH v3 1/2] PCI: PCIe: ASPM: Introduce pcie_aspm_enabled()

From: Rafael J. Wysocki
Date: Tue Oct 08 2019 - 18:54:52 EST


On Tue, Oct 8, 2019 at 11:16 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Tue, Oct 08, 2019 at 11:27:51AM +0200, Rafael J. Wysocki wrote:
> > On Tue, Oct 8, 2019 at 12:34 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > On Thu, Aug 08, 2019 at 11:55:07PM +0200, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > > >
> > > > Add a function checking whether or not PCIe ASPM has been enabled for
> > > > a given device.
> > > >
> > > > It will be used by the NVMe driver to decide how to handle the
> > > > device during system suspend.
> > > >
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > > > ---
> > > >
> > > > v2 -> v3:
> > > > * Make the new function return bool.
> > > > * Change its name back to pcie_aspm_enabled().
> > > > * Fix kerneldoc comment formatting.
> > > >
> > > > -> v2:
> > > > * Move the PCI/PCIe ASPM changes to a separate patch.
> > > > * Add the _mask suffix to the new function name.
> > > > * Add EXPORT_SYMBOL_GPL() to the new function.
> > > > * Avoid adding an unnecessary blank line.
> > > >
> > > > ---
> > > > drivers/pci/pcie/aspm.c | 20 ++++++++++++++++++++
> > > > include/linux/pci.h | 3 +++
> > > > 2 files changed, 23 insertions(+)
> > > >
> > > > Index: linux-pm/drivers/pci/pcie/aspm.c
> > > > ===================================================================
> > > > --- linux-pm.orig/drivers/pci/pcie/aspm.c
> > > > +++ linux-pm/drivers/pci/pcie/aspm.c
> > > > @@ -1170,6 +1170,26 @@ static int pcie_aspm_get_policy(char *bu
> > > > module_param_call(policy, pcie_aspm_set_policy, pcie_aspm_get_policy,
> > > > NULL, 0644);
> > > >
> > > > +/**
> > > > + * pcie_aspm_enabled - Check if PCIe ASPM has been enabled for a device.
> > > > + * @pci_device: Target device.
> > > > + */
> > > > +bool pcie_aspm_enabled(struct pci_dev *pci_device)
> > > > +{
> > > > + struct pci_dev *bridge = pci_upstream_bridge(pci_device);
> > > > + bool ret;
> > > > +
> > > > + if (!bridge)
> > > > + return false;
> > > > +
> > > > + mutex_lock(&aspm_lock);
> > > > + ret = bridge->link_state ? !!bridge->link_state->aspm_enabled : false;
> > > > + mutex_unlock(&aspm_lock);
> > >
> > > Why do we need to acquire aspm_lock here? We aren't modifying
> > > anything, and I don't think we're preventing a race. If this races
> > > with another thread that changes aspm_enabled, we'll return either the
> > > old state or the new one, and I think that's still the case even if we
> > > don't acquire aspm_lock.
> >
> > Well, if we can guarantee that pci_remove_bus_device() will never be
> > called in parallel with this helper, then I agree, but can we
> > guarantee that?
>
> Hmm, yeah, I guess that's the question. It's not a race with another
> thread changing aspm_enabled; the potential race is with another
> thread removing the last child of "bridge", which will free the
> link_state and set bridge->link_state = NULL.
>
> I think it should be safe to call device-related PCI interfaces if
> you're holding a reference to the device, e.g., from a driver bound to
> the device or a sysfs accessor. Since we call pcie_aspm_enabled(dev)
> from a driver bound to "dev", another thread should not be able to
> remove "dev" while we're using it.
>
> I know that's a little hand-wavey, but if it weren't true, I think
> we'd have a lot more locking sprinkled everywhere in the PCI core than
> we do.
>
> This has implications for Heiner's ASPM sysfs patches because we're
> currently doing this in sysfs accessors:
>
> static ssize_t aspm_attr_show_common(struct device *dev, ...)
> {
> ...
> link = pcie_aspm_get_link(pdev);
>
> mutex_lock(&aspm_lock);
> enabled = link->aspm_enabled & state;
> mutex_unlock(&aspm_lock);
> ...
> }
>
> I assume sysfs must be holding a reference that guarantees "dev" is
> valid througout this code, and therefore we should not need to hold
> aspm_lock.

In principle, pcie_aspm_enabled() need not be called via sysfs.

In the particular NVMe use case, it is called from the driver's own PM
callback, so it would be safe without the locking AFAICS.

I guess it is safe to drop the locking from there, but then it would
be good to mention in the kerneldoc that calling it is only safe under
the assumption that the link_state object cannot go away while it is
running.