Re: [PATCH v2] PCI/EDR: Clear PCIe Device Status errors after EDR error recovery

From: Bjorn Helgaas
Date: Fri Mar 31 2023 - 11:10:26 EST


On Thu, Mar 30, 2023 at 11:46:45PM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 3/30/23 8:45 AM, Bjorn Helgaas wrote:

> > This sounds like a plausible assumption. But is there actually spec
> > language that says EDR notification is not allowed in the AER native
> > case (when OS owns the AER Capability)? I looked but didn't find
> > anything.
>
> In the PCIe firmware specification v3.3, table "Table 4-6: Interpretation of
> the _OSC Control Field, Returned Value", field "PCI Express Downstream Port
> Containment configuration control", it explains that the firmware can use
> EDR notification only when OS DPC control is not requested or denied by
> firmware.

I'm sure that's the intent, but I don't see that restriction in the
spec. Here's what I'm looking at, which doesn't directly restrict
generation of EDR notifications:

If control of this feature was requested and denied, or was not
requested, firmware is responsible for initializing Downstream Port
Containment Extended Capability Structures per firmware policy.
Further, [the OS is permitted to write several registers while
processing an EDR notification]

> > Actually I do have one idea: in the firmware-first case, firmware
> > collects all the status information, clears it, and then passes the
> > status on to the OS. In this case we don't need to clear the status
> > registers in handle_error_source(), pcie_do_recovery(), etc.
>
> So the idea is to get the error info in a particular format using
> something like _DSM call?

No, that's not what I'm thinking at all. I definitely would not want
to add a new _DSM, which would add yet another case the OS has to
handle.

In the firmware-first case, the firmware collects the error status and
clears it before handing the info off to the OS error handling path.

In the native case, the OS should be able to collect the error status
and clear it before starting the OS error handling path. Same
register accesses, should be indistinguishable from the device point
of view, it's just that the register accesses would be done by the OS
instead of by firmware.

Bjorn