Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
From: Sinan Kaya
Date: Tue Jul 10 2018 - 14:50:07 EST
On Mon, Jul 9, 2018 at 12:00 PM, Lukas Wunner <lukas@xxxxxxxxx> wrote:
>
> On Mon, Jul 09, 2018 at 08:48:44AM -0600, Sinan Kaya wrote:
> > On 7/8/18, Lukas Wunner <lukas@xxxxxxxxx> wrote:
> > > On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote:
> > > > My solution doesn't help if link down interrupt is observed before the
> > > > AER
> > > > or DPC services.
> > >
> > > If pciehp gets an interrupt quicker than dpc/aer, it will (at least with
> > > my patches) remove all devices, check if the presence bit is set,
> > > and if so, try to bring the slot up again.
> >
> > Hotplug driver should only observe a link down interrupt. Link would
> > come up in response to a secondary bus reset initiated by the AER
> > driver.
>
> PCIe hotplug doesn't have separate Link Down and Link Up interrupts,
> there is only a Link State *Changed* event.
>
> > Can you point me to the code that would bring up the link in hp code?
>
> I was referring to the situation with my recently posted pciehp patches
> applied, in particular patch [21/32] ("PCI: pciehp: Become resilient to
> missed events"):
> https://patchwork.ozlabs.org/patch/930389/
>
> When I get a presence or link changed event, I turn the slot off. That
> includes removing all devices in the slot. Because even if the slot is
> still occupied or link is up, there was definitely a change and the safe
> behavior is to assume that the card in the slot is now a different one
> than before.
>
We do have a bit of mess unfortunately. Error handling and hotplug drivers
do not play nicely with each other.
When hotplug driver observes a link down, we are not checking if the
link down happened because user really wanted to remove a card or
if it was because it was originated by an error handling service such
as AER/DPC.
I'm thinking that we could potentially check if a hotplug event is pending
at the entrance of fatal error handling. If it is pending, we could poll until
the status bit clears. That should flush the link down event.
Even then, link down indication of hotplug seem to turn off slot power
and LED.
If AER/DPC service runs after the hotplug driver, link won't come back
up as the power to the slot is turned off.
I'd like to hear about Bjorn's opinion before we throw something else
into this problem.
> Afterwards, I check if the slot is occupied or link is up. If either
> of those conditions is true, I try to bring the slot up again.
>
> Thanks,
>
> Lukas