Re: [PATCH v2 2/4] PCI/DPC/AER: Address Concurrency between AER and DPC

From: Keith Busch
Date: Tue Jan 02 2018 - 12:08:52 EST


On Tue, Jan 02, 2018 at 08:25:08AM -0500, Sinan Kaya wrote:
> > 2. A DPC event suppresses the error message required for the Linux
> > AER driver to run. How can AER and DPC run concurrently?
> >
>
> As we briefly discussed in previous email exchanges, I think you are
> looking at a use case with a switch that supports DPC functionality.

No, I'm interested in DPC in a general.

> Oza and I are looking at a root port functionality with DPC feature.
>
> As you already know, AER errors are logged to AER capability register
> independent of the DPC driver presence.

The error is noted in the Uncorrectable Error Status Register if that's
what triggered the DPC event. This register has nothing to do with the
Root Error Status Register, which is required to have received an error
Message in order to have a status for the AER driver.

> A root port is also allowed to share the MSI interrupts across DPC and
> AER.
>
> Therefore, when a DPC interrupt fires; both AER driver and DPC driver
> starts recovery work. This is the issue we are trying to deal with.

If DPC is implemented correctly, the AER Root Status can't have an
uncorrectable status for the driver to deal with. The only thing the AER
driver could possibly see is a correctable error if DPC ERR_COR Enable
is set.

> In the end, the driver needs to work for both root port and switches.
> I think you verified it against a switch. We are doing the same for a
> root port and submitting the plumbing code.

I think we need to consider the possibility you are enabling a platform
that implemented DPC incorrectly. There's nothing in the specification
that says that DPC enabled root ports are not to discard the error message
if it came from downstream, or skip signalling the message for root port
detected errors.