Re: [PATCH] drivers: pci: Clear ACS state at kexec

From: Deepa Dinamani
Date: Wed Jan 08 2020 - 12:39:23 EST


On Mon, Jan 6, 2020 at 6:38 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Sat, Jan 04, 2020 at 02:51:49PM -0800, Deepa Dinamani wrote:
> > ACS bits remain sticky through kexec reset. This is not really a
> > problem for Linux because turning IOMMU on assumes ACS on. But,
> > this becomes a problem if we kexec into something other than
> > Linux and that does not turn ACS on always.
>
> "Sticky" in the PCIe spec specifically describes bits that are
> unaffected by hot reset or FLR (see PCIe r5.0, sec 7.4), but the
> PCI_ACS_CTRL register contains no RWS bits, so I don't think that's
> what you mean here.
>
> I suspect you mean something like "kexec doesn't reset the device, so
> the new kernel inherits the PCI_ACS_CTRL settings from Linux"?

Yes, I was using "sticky through kexec reset" as a phrase. I will make
this more clear to read that the PCI_ACS_CTRL settings are not cleared
during kexec.

> So it looks like you're unconditionally disabling ACS during kexec.
> The comment in pci_enable_acs() suggests that ACS may have been
> enabled by platform firmware. Maybe we should *restore* the original
> ACS settings from before Linux enabled ACS rather than clearing them?

I thought about this. I considered following scenarios:

1. Old Linux Kernel (IOMMU on) - > New Linux Kernel (IOMMU on) - ACS
bits that the kernel touches are unaffected whether or not firmware
has enabled these.
2. Old Linux Kernel (IOMMU on) - > New Linux Kernel (IOMMU off) - ACS
bits that the kernel touches are cleared even if the firmware has
enabled these. But, we already do pci_disable_acs_redir() when we boot
with IOMMU off. So this is similar. Also, if the IOMMU is off, I do
not see how ACS is useful unless we kexec a second time and turn the
IOMMU on again. And, if we are kexec-ing into Linux again, we know
that this is a no-op for ACS as in [1].
3. Old Linux Kernel(IOMMU on) -> New Kernel(not Linux) (IOMMU on/ off)
- This just matters if the new kernel even does a read/modify/write.
We really can define no policy here unless we can invent a signalling
mechanism from BIOS.

> > Reset the ACS bits to default before kexec or device remove.
> >
> > Signed-off-by: Deepa Dinamani <deepa.kernel@xxxxxxxxx>
> > ---
> > drivers/pci/pci-driver.c | 4 ++++
> > drivers/pci/pci.c | 39 +++++++++++++++++++++++++++------------
> > drivers/pci/pci.h | 1 +
> > 3 files changed, 32 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index 0454ca0e4e3f..bd8d08e50b97 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -453,6 +453,8 @@ static int pci_device_remove(struct device *dev)
> >
> > /* Undo the runtime PM settings in local_pci_probe() */
> > pm_runtime_put_sync(dev);
> > + /* Undo the PCI ACS settings in pci_init_capabilities() */
> > + pci_disable_acs(pci_dev);
>
> Can this be a separate patch? It doesn't seem to have anything to do
> with kexec, so a different patch with (presumably) a different
> rationale would be good.

We call device_shutdown() during kexec. If the devices are removed,
then the pci_device_shutdown() is not called on such devices. Hence,
the ACS settings we set during the probe() are still active. This is
trying to clear that state. This should be part of the same patch for
completeness, I think. Although this should be moved up as the first
thing this function does. And, a check for the power state you
suggested below.

Also I noticed another thing(when testing) that is ommited in the
patch is that this is not sufficient for bridge devices(without a
driver). I'm thinking pci_remove_bus_device() is a good place for such
devices. Let me know if you have a better recommendation.

> > /*
> > * If the device is still on, set the power state as "unknown",
> > @@ -493,6 +495,8 @@ static void pci_device_shutdown(struct device *dev)
> > */
> > if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
> > pci_clear_master(pci_dev);
> > + if (kexec_in_progress)
> > + pci_disable_acs(pci_dev);
>
> Shouldn't this be in the same "if" block as pci_clear_master()? If
> the device is in D3cold, it's not going to work to disable ACS because
> config space isn't accessible.

Thanks. This is something I had missed. I will add this in v2.

> > }
> >
> > #ifdef CONFIG_PM
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index da2e59daad6f..8254617cff03 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3261,15 +3261,23 @@ static void pci_disable_acs_redir(struct pci_dev *dev)
> > pci_info(dev, "disabled ACS redirect\n");
> > }
> >
> > +
> > +/* Standard PCI ACS capailities
> > + * Source Validation | P2P Request Redirect | P2P Completion Redirect | Upstream Forwarding
>
> Please make this comment fit in 80 columns.

Will do.

> > + */
> > +#define PCI_STD_ACS_CAP (PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF)
> > +
> > /**
> > - * pci_std_enable_acs - enable ACS on devices using standard ACS capabilities
> > + * pci_std_enable_disable_acs - enable/disable ACS on devices using standard
> > + * ACS capabilities
> > * @dev: the PCI device
> > */
> > -static void pci_std_enable_acs(struct pci_dev *dev)
> > +static void pci_std_enable_disable_acs(struct pci_dev *dev, int enable)
>
> Maybe you could split this refactoring into its own patch that doesn't
> actually change any behavior? Then the kexec patch would be a
> one-liner and the device remove patch would be another one-liner, so
> it's obvious where the important changes are.

Sure, I can do it that way.

> > {
> > int pos;
> > u16 cap;
> > u16 ctrl;
> > + u16 val = 0;
> >
> > pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> > if (!pos)
> > @@ -3278,19 +3286,26 @@ static void pci_std_enable_acs(struct pci_dev *dev)
> > pci_read_config_word(dev, pos + PCI_ACS_CAP, &cap);
> > pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
> >
> > - /* Source Validation */
> > - ctrl |= (cap & PCI_ACS_SV);
> > + val = (cap & PCI_STD_ACS_CAP);
> >
> > - /* P2P Request Redirect */
> > - ctrl |= (cap & PCI_ACS_RR);
> > + if (enable)
> > + ctrl |= val;
> > + else
> > + ctrl &= ~val;
> >
> > - /* P2P Completion Redirect */
> > - ctrl |= (cap & PCI_ACS_CR);
> > + pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> > +}
> >
> > - /* Upstream Forwarding */
> > - ctrl |= (cap & PCI_ACS_UF);
> > +/**
> > + * pci_disable_acs - enable ACS if hardware support it
>
> s/enable/disable/ (in comment)
> s/support/supports/

OK.