Re: [PATCH v2 1/1] PCI: pciehp: Skip DLLSC handling if DPC is triggered
From: Lukas Wunner
Date: Sun Mar 28 2021 - 06:04:50 EST
On Wed, Mar 17, 2021 at 01:02:07PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> On 3/17/21 12:01 PM, Lukas Wunner wrote:
> > If the events are ignored, the driver of the device in the hotplug slot
> > is not unbound and rebound. So the driver must be able to cope with
> > loss of TLPs during DPC recovery and it must be able to cope with
> > whatever state the endpoint device is in after DPC recovery.
> > Is this really safe? How does the nvme driver deal with it?
>
> During DPC recovery, in pcie_do_recovery() function, we use
> report_frozen_detected() to notify all devices attached to the port
> about the fatal error. After this notification, we expect all
> affected devices to halt its IO transactions.
>
> Regarding state restoration, after successful recovery, we use
> report_slot_reset() to notify about the slot/link reset. So device
> drivers are expected to restore the device to working state after this
> notification.
Thanks a lot for the explanation.
> I am not sure how pure firmware DPC recovery works. Is there a platform
> which uses this combination? For firmware DPC model, spec does not clarify
> following points.
>
> 1. Who will notify the affected device drivers to halt the IO transactions.
> 2. Who is responsible to restore the state of the device after link reset.
>
> IMO, pure firmware DPC does not support seamless recovery. I think after it
> clears the DPC trigger status, it might expect hotplug handler be responsible
> for device recovery.
>
> I don't want to add fix to the code path that I don't understand. This is the
> reason for extending this logic to pure firmware DPC case.
I agree, let's just declare synchronization of pciehp with
pure firmware DPC recovery as unsupported for now.
I've just submitted a refined version of my patch to the list:
https://lore.kernel.org/linux-pci/b70e19324bbdded90b728a5687aa78dc17c20306.1616921228.git.lukas@xxxxxxxxx/
If you could give this new version a whirl I'd be grateful.
This version contains more code comments and kernel-doc.
There's now a check in dpc_completed() whether the DPC Status
register contains "all ones", which can happen when a DPC-capable
hotplug port is hot-removed, i.e. for cascaded DPC-capable hotplug
ports.
I've also realized that the previous version was prone to races
which are theoretical but should nonetheless be avoided:
E.g., previously the DLLSC event was only removed from "events"
if the link is still up after DPC recovery. However if DPC
triggers and recovers multiple times in a row, the link may
happen to be up but a new DLLSC event may have been picked up
in "pending_events" which should be ignored. I've solved this
by inverting the logic such that DLLSC is *always* removed from
"events", and if the link is unexpectedly *down* after successful
recovery, a DLLSC event is synthesized.
Thanks,
Lukas