On Tue, Jul 03, 2018 at 07:30:28AM -0400, okaya@xxxxxxxxxxxxxx wrote:
On 2018-07-03 04:34, Lukas Wunner wrote:
>On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>>If a bridge supports hotplug and observes a PCIe fatal error, the
>>following
>>events happen:
>>
>>1. AER driver removes the devices from PCI tree on fatal error
>>2. AER driver brings down the link by issuing a secondary bus reset
>>waits
>>for the link to come up.
>>3. Hotplug driver observes a link down interrupt
>>4. Hotplug driver tries to remove the devices waiting for the rescan
>>lock
>>but devices are already removed by the AER driver and AER driver is
>>waiting
>>for the link to come back up.
>>5. AER driver tries to re-enumerate devices after polling for the link
>>state to go up.
>>6. Hotplug driver obtains the lock and tries to remove the devices
>>again.
>>
>>If a bridge is a hotplug capable bridge, mask hotplug interrupts before
>>the
>>reset and unmask afterwards.
>
>Would it work for you if you just amended the AER driver to skip
>removal and re-enumeration of devices if the port is a hotplug bridge?
>Just check for is_hotplug_bridge in struct pci_dev.
The reason why we want to remove devices before secondary bus reset is to
quiesce pcie bus traffic before issuing a reset.
Skipping this step might cause transactions to be lost in the middle of the
reset as there will be active traffic flowing and drivers will suddenly
start reading ffs.
Interesting, I think that merits a code comment.
FWIW, macOS has a "PCI pause" callback to quiesce a device:
https://opensource.apple.com/source/IOPCIFamily/IOPCIFamily-239.1.2/pause.rtf
They're using it to reconfigure a device's BAR and bus number
at runtime (sic!), e.g. if mmio windows need to be moved around
on Thunderbolt hotplug if there's insufficient space:
"During pause reconfiguration, the following may be changed:
- device BAR registers
- the devices bus number
- registry properties reflecting these values ("ranges",
"assigned-addresses", "reg")
- device MSI block values for address and value, but not the
number of MSIs allocated"
Conceptually, "PCI pause" is similar to putting the device in a suspend
state. I'm wondering if suspending the devices below the bridge would
make more sense than removing them in the AER driver.
Lukas