Re: [PATCH 1/4] PCI/hotplug: Disable HPIE over reset

From: Ilpo Järvinen
Date: Mon Mar 17 2025 - 14:09:10 EST


On Sat, 15 Mar 2025, Lukas Wunner wrote:

> On Thu, Mar 13, 2025 at 04:23:30PM +0200, Ilpo Järvinen wrote:
> > pciehp_reset_slot() disables PDCE (Presence Detect Changed Enable) and
> > DLLSCE (Data Link Layer State Changed Enable) for the duration of reset
> > and clears the related status bits PDC and DLLSC from the Slot Status
> > register after the reset to avoid hotplug incorrectly assuming the card
> > was removed.
> >
> > However, hotplug shares interrupt with PME and BW notifications both of
> > which can make pciehp_isr() to run despite PDCE and DLLSCE bits being
> > off. pciehp_isr() then picks PDC or DLLSC bits from the Slot Status
> > register due to the events that occur during reset and caches them into
> > ->pending_events. Later, the IRQ thread in pciehp_ist() will process
> > the ->pending_events and will assume the Link went Down due to a card
> > change (in pciehp_handle_presence_or_link_change()).
> >
> > Change pciehp_reset_slot() to also clear HPIE (Hot-Plug Interrupt
> > Enable) as pciehp_isr() will first check HPIE to see if the interrupt
> > is not for it. Then synchronize with the IRQ handling to ensure no
> > events are pending, before invoking the reset.
>
> After dwelling on this for a while, I'm thinking that it may re-introduce
> the issue fixed by commit f5eff5591b8f ("PCI: pciehp: Fix AB-BA deadlock
> between reset_lock and device_lock"):
>
> Looking at the second and third stack trace in its commit message,
> down_write(reset_lock) in pciehp_reset_slot() is basically equivalent
> to synchronize_irq() and we're holding device_lock() at that point,
> hindering progress of pciehp_ist().

This description was somewhat confusing but what I can see, now that you
mentioned this, is that if pciehp_reset_slot() calls synchronize_irq(), it
can result in trying to acquire device_lock() again while trying to drain
the pending events. ->reset_lock seems irrelevant to that problem.

Thus, pciehp_reset_slot() cannot ever rely on completing the processing of
all pending events before it invokes the reset as long as any of its
callers is holding device_lock().

It's a bit sad, because removing most of the reset_lock complexity would
have been nice simplification in locking, effectively it would have
reverted f5eff5591b8f too.

> So I think I have guided you in the wrong direction and I apologize
> for that.
>
> However it seems to me that this should be solvable with the small
> patch below. Am I missing something?
>
> @Joel Mathew Thomas, could you give the below patch a spin and see
> if it helps?
>
> Thanks!
>
> -- >8 --
>
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index bb5a8d9f03ad..99a2ac13a3d1 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -688,6 +688,11 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
> return IRQ_HANDLED;
> }
>
> + /* Ignore events masked by pciehp_reset_slot(). */
> + events &= ctrl->slot_ctrl;
> + if (!events)
> + return IRQ_HANDLED;
> +
> /* Save pending events for consumption by IRQ thread. */
> atomic_or(events, &ctrl->pending_events);
> return IRQ_WAKE_THREAD;

Yes, this should work, I think.

I'm not entirely sure though how reading ->slot_ctrl here synchronizes
wrt. pciehp_reset_slot() invoking reset. What guarantees pciehp_isr() sees
the updated ->slot_ctrl when pciehp_reset_slot() has proceeded to invoke
the reset? (I'm in general very hesitant about lockless and barrierless
reader being race free, I might be just paranoid about it.)

--
i.