Re: [PATCH v12 0/6] Address error and recovery for AER and DPC
From: Keith Busch
Date: Mon Mar 12 2018 - 13:54:39 EST
On Mon, Mar 12, 2018 at 01:41:07PM -0400, Sinan Kaya wrote:
> I was just writing a reply to you. You acted first :)
>
> On 3/12/2018 1:33 PM, Keith Busch wrote:
> >>> After releasing a slot from DPC, the link is allowed to retrain. If
> >>> there
> >>> is a working device on the other side, a link up event occurs. That
> >>> event is handled by the pciehp driver, and that schedules enumeration
> >>> no matter what you do to the DPC driver.
> >> yes, that is what i current, but this patch-set makes DPC aware of error
> >> handling driver callbacks.
> > I've been questioning the utility of doing that since the very first
> > version of this patch set.
> >
>
> I think we should all agree that shutting down the device drivers with active
> work is not safe. There could be outstanding work that the endpoint driver
> needs to take care of.
>
> That was the motivation for this change so that we give endpoint drivers an
> error callback when something goes wrong.
>
> The rest is implementation detail that we can all figure out.
I'm not sure if I agree here. All Linux device drivers are supposed to
cope with sudden/unexpected loss of communication at any time. This
includes cleaning up appropriately when requested to unbind from an
inaccessible device with active outstanding work.