Re: [PATCH v11 5/8] PCI/AER: Allow clearing Error Status Register in FF mode

From: Kuppuswamy Sathyanarayanan
Date: Tue Dec 31 2019 - 13:13:52 EST


Hi Bjorn,

On 12/30/19 3:59 PM, Bjorn Helgaas wrote:
[+cc Austin]

On Thu, Dec 26, 2019 at 04:39:11PM -0800, sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx wrote:
From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>

As per PCI firmware specification r3.2 System Firmware Intermediary
(SFI) _OSC and DPC Updates ECR
(https://members.pcisig.com/wg/PCI-SIG/document/13563),
What is the state of this ECR? I see it in the "PCI Express Review
Zone Archive". I don't know what the usage is of the "Review Zone" vs
the "Review Zone Archive / PCI Express Review Zone Archive". AFAICS,
it is not listed in any of the "Documents for 60 Day Member Review".

And I think it needs some clarification (for one thing, it needs to
say what the red/blue text means). I've mentioned some other items to
Austin, but I haven't read it in detail because it seems like it's not
quite baked yet.

E.g., there's language about "it may make sense for an embedded system
OS to own SFI, but it's recommended that general-purpose OSes never
request SFI ownership." That's useless: Linux is certainly a general
purpose OS, but Linux is also often an embedded OS. So the ECR
doesn't provide useful guidance about how an OS should decide whether
to request SFI ownership.
This ECR has merged three different change proposals (SFI related,
_OSC related updates and update to implementation note of DPC
handling with EDR support) into a single document. Out of these
three changes, we only care about "DPC implementation note update".

We already have a ECR specification for Error Disconnect Recover (EDR)
support (https://members.pcisig.com/wg/PCI-SIG/document/12888) in published
spec section. But this document has some ambiguous statements / missing details
which as clarified in the implementation note section of mentioned ECR.

Making code changes based on a published spec or ECN is fine,
obviously. Changes based on an ECR that is well on track to being
accepted, e.g., is in the 60-day review period, are probably OK. I
don't yet have warm fuzzies about this ECR because I have no idea how
far along it is.

We might be able to justify some of these changes based on other
specs; it just sounds weird to me to say "based on this Engineering
Change Request that might be accepted someday, we must do X". Anybody
can dream up an ECR that says anything at all, so AFAICT, an ECR is
not at all authoritative.

sec titled
"DPC Event Handling Implementation Note", page 10, Error Disconnect
Recover (EDR) support allows OS to handle error recovery and clearing
Error Registers even in FF mode. So create exception for FF mode checks
in pci_cleanup_aer_uncorrect_error_status(), pci_aer_clear_fatal_status()
and pci_cleanup_aer_error_status_regs() functions when its being called
from DPC code path.

--
Sathyanarayanan Kuppuswamy
Linux kernel developer