Re: [PATCH v11 5/8] PCI/AER: Allow clearing Error Status Register in FF mode

From: Bjorn Helgaas
Date: Mon Dec 30 2019 - 19:01:36 EST


[+cc Austin]

On Thu, Dec 26, 2019 at 04:39:11PM -0800, sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
>
> As per PCI firmware specification r3.2 System Firmware Intermediary
> (SFI) _OSC and DPC Updates ECR
> (https://members.pcisig.com/wg/PCI-SIG/document/13563),

What is the state of this ECR? I see it in the "PCI Express Review
Zone Archive". I don't know what the usage is of the "Review Zone" vs
the "Review Zone Archive / PCI Express Review Zone Archive". AFAICS,
it is not listed in any of the "Documents for 60 Day Member Review".

And I think it needs some clarification (for one thing, it needs to
say what the red/blue text means). I've mentioned some other items to
Austin, but I haven't read it in detail because it seems like it's not
quite baked yet.

E.g., there's language about "it may make sense for an embedded system
OS to own SFI, but it's recommended that general-purpose OSes never
request SFI ownership." That's useless: Linux is certainly a general
purpose OS, but Linux is also often an embedded OS. So the ECR
doesn't provide useful guidance about how an OS should decide whether
to request SFI ownership.

Making code changes based on a published spec or ECN is fine,
obviously. Changes based on an ECR that is well on track to being
accepted, e.g., is in the 60-day review period, are probably OK. I
don't yet have warm fuzzies about this ECR because I have no idea how
far along it is.

We might be able to justify some of these changes based on other
specs; it just sounds weird to me to say "based on this Engineering
Change Request that might be accepted someday, we must do X". Anybody
can dream up an ECR that says anything at all, so AFAICT, an ECR is
not at all authoritative.

> sec titled
> "DPC Event Handling Implementation Note", page 10, Error Disconnect
> Recover (EDR) support allows OS to handle error recovery and clearing
> Error Registers even in FF mode. So create exception for FF mode checks
> in pci_cleanup_aer_uncorrect_error_status(), pci_aer_clear_fatal_status()
> and pci_cleanup_aer_error_status_regs() functions when its being called
> from DPC code path.