Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

From: Greg Kroah-Hartman
Date: Thu Nov 08 2018 - 17:13:28 EST


On Thu, Nov 08, 2018 at 02:09:17PM -0600, Bjorn Helgaas wrote:
> [+cc Jonathan, Greg, Lukas, Russell, Sam, Oliver for discussion about
> PCI error recovery in general]
>
> On Wed, Nov 07, 2018 at 05:42:57PM -0600, Bjorn Helgaas wrote:
> > On Tue, Sep 18, 2018 at 05:15:00PM -0500, Alexandru Gagniuc wrote:
> > > When a PCI device is gone, we don't want to send IO to it if we can
> > > avoid it. We expose functionality via the irq_chip structure. As
> > > users of that structure may not know about the underlying PCI device,
> > > it's our responsibility to guard against removed devices.
> > >
> > > .irq_write_msi_msg() is already guarded inside __pci_write_msi_msg().
> > > .irq_mask/unmask() are not. Guard them for completeness.
> > >
> > > For example, surprise removal of a PCIe device triggers teardown. This
> > > touches the irq_chips ops some point to disable the interrupts. I/O
> > > generated here can crash the system on firmware-first machines.
> > > Not triggering the IO in the first place greatly reduces the
> > > possibility of the problem occurring.
> > >
> > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@xxxxxxxxx>
> >
> > Applied to pci/misc for v4.21, thanks!
>
> I'm having second thoughts about this. One thing I'm uncomfortable
> with is that sprinkling pci_dev_is_disconnected() around feels ad hoc
> instead of systematic, in the sense that I don't know how we convince
> ourselves that this (and only this) is the correct place to put it.

I think my stance always has been that this call is not good at all
because once you call it you never really know if it is still true as
the device could have been removed right afterward.

So almost any code that relies on it is broken, there is no locking and
it can and will race and you will loose.

I think your patch suffers from this race:

> +static u32 mmio_readl(struct pci_dev *dev, const volatile void __iomem *addr)
> +{
> + u32 val, id;
> +
> + if (pci_dev_is_disconnected(dev))
> + return ~0;

Great, but what happens if I yank the device out right here?

> + val = readl(addr);

This value could now be all FF, if the device is gone, so what did the
check above help with?

> + /*
> + * If an MMIO read from the device returns ~0 data, that data may
> + * be valid, or it may indicate a bus error. If config space is
> + * readable, assume it's valid data; otherwise, assume a bus error.
> + */
> + if (val == ~0) {
> + pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> + if (id == ~0)
> + pci_dev_set_disconnected(dev, NULL);

So why do the check above for "is disconnected"? What does this buy us
here, just short-circuiting the readl()?

> + }
> +
> + return val;
> +}
> +
> +static void mmio_writel(struct pci_dev *dev, u32 val,
> + volatile void __iomem *addr)
> +{
> + if (pci_dev_is_disconnected(dev))
> + return;
> +
> + writel(val, addr);

Why even check, what's wrong with always doing the write?

I understand the wish to make this easier, but I think the only way is
that the driver themselves should be checking on their reads. And they
have to check on all reads, or at least on some subset of reads and be
able to handle 0xff for the other ones without going crazy.

I _think_ the xhci driver does this given that it is hot added/removed
all the time dynamically due to the way that modern laptops are made
where the bios adds/removed the xhci controller when a USB device is
added/removed.

thanks,

greg k-h