Re: [PATCH v1 1/1] PCI/ERR: Handle fatal error recovery for non-hotplug capable devices

From: Oliver O'Halloran
Date: Tue May 26 2020 - 23:35:21 EST


On Wed, May 27, 2020 at 1:06 PM Kuppuswamy, Sathyanarayanan
<sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> wrote:
>
> Yes, in case of DPC (Fatal errors) link is already reset. So we
> don't need any special handling. This reset logic is mainly for
> non-fatal errors.

Why? In our experience most fatal errors aren't all that fatal and can
be recovered by resetting the device. The base spec backs that up (see
gen5 base, sec 6.2) too saying the main point of distinction between
fatal and non-fatal errors is whether handling the error requires a
reset or not. For EEH we always try to recover the device and only
mark it as permanently failed once the devices goes over the max error
threshold (5 errors per hour, by default). Doing something similar for
(native) DPC would make sense IMO.