Re: [PATCH v3 1/1] PCI/ERR: Fix reset logic in pcie_do_recovery() call

From: Sinan Kaya
Date: Mon Sep 28 2020 - 14:02:40 EST


On 9/28/2020 1:15 PM, Kuppuswamy, Sathyanarayanan wrote:
> Since there is no state restoration for FATAL errors, I am wondering
> whether
> calls to ->error_detected(), ->mmio_enabled() and ->slot_reset() are
> required?

Good question,

Initially when we started, we were trying to handle both NON_FATAL and
FATAL errors in DPC.

We have seen value in unifying AER's callback mechanism with DPC.
It looks like this no longer applies for DPC.

Some drivers want these indication to stop outgoing DMA/timers so that
system can recover quickly.

There is value in calling them with existing AER based design.

I agree it doesn't apply here anymore if we are going to remove the
device driver. Maybe, you should stop calling pcie_do_recovery() in DPC
as well.

>
> Let me know your comments about following pseudo code.
>
> if (fatal error & hotplug_supported)
>    do nothing // if fatal triggered by DPC, clear DPC state.
>
> if (fatal error & no-hotplug)
>   perform slot_reset and renumerate affected devices.

LGTM,

I apologize for calling this slot_reset but slot_reset in err.c code is
for post recovery callback to endpoint drivers. Let's not use this term
here anymore to not confuse ourselves.

remove device + rescan similar to how hotplug remove + hotplug insertion
notifications does eventually.

All of this to be done in DPC driver without any err.c involvement.

Bjorn,

What do you think? Is this a good direction?

Sinan