Re: [PATCH v6 1/2] acpi: apei: Rename ghes_severity() to ghes_cper_severity()

From: Luck, Tony
Date: Tue May 22 2018 - 13:39:41 EST


On Tue, May 22, 2018 at 08:10:47PM +0200, Rafael J. Wysocki wrote:
> > PCIe fatal means that the link or the device is broken.
>
> And that may really mean that the component in question is on fire.
> We just don't know.

Components on fire could be the root cause of many errors. If we really
believe that is a problem we should power the system off rather than
just calling panic() [not just for PCIe errors, but also for machine
checks, and perhaps a bunch of other places in the kernel].

True story: I used to work for Stratus Computer on fault tolerant
systems. A customer once called in with a "my computer is on fire"
report and asked what to do. The support person told them to power it
off. Customer asked "Isn't there something else? It's still running
just fine".

-Tony