Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

From: Benjamin Herrenschmidt
Date: Wed Jul 15 2020 - 18:52:32 EST


On Wed, 2020-07-15 at 17:12 -0500, Bjorn Helgaas wrote:
> > I've 'played' with PCIe error handling - without much success.
> > What might be useful is for a driver that has just read ~0u to
> > be able to ask 'has there been an error signalled for this device?'.
>
> In many cases a driver will know that ~0 is not a valid value for the
> register it's reading. But if ~0 *could* be valid, an interface like
> you suggest could be useful. I don't think we have anything like that
> today, but maybe we could. It would certainly be nice if the PCI core
> noticed, logged, and cleared errors. We have some of that for AER,
> but that's an optional feature, and support for the error bits in the
> garden-variety PCI_STATUS register is pretty haphazard. As you note
> below, this sort of SERR/PERR reporting is frequently hard-wired in
> ways that takes it out of our purview.

We do have pci_channel_state (via pci_channel_offline()) which covers
the cases where the underlying error handling (such as EEH or unplug)
results in the device being offlined though this tend to be
asynchronous so it might take a few ~0's before you get it.

It's typically used to break potentially infinite loops in some
drivers.

There is no interface to check whether *an* error happened though for
the most cases it will be captured in the status register, which is
harvested (and cleared ?) by some EDAC drivers iirc...

All this lacks coordination, I agree.

Cheers,
Ben.