Re: [PATCH] PCI: pciehp: Avoid returning prematurely from sysfs requests

From: Lukas Wunner
Date: Fri Aug 09 2019 - 14:38:42 EST


On Fri, Aug 09, 2019 at 10:28:15AM -0700, sathyanarayanan kuppuswamy wrote:
> On 8/9/19 3:28 AM, Lukas Wunner wrote:
> > A sysfs request to enable or disable a PCIe hotplug slot should not
> > return before it has been carried out. That is sought to be achieved
> > by waiting until the controller's "pending_events" have been cleared.
> >
> > However the IRQ thread pciehp_ist() clears the "pending_events" before
> > it acts on them. If pciehp_sysfs_enable_slot() / _disable_slot() happen
> > to check the "pending_events" after they have been cleared but while
> > pciehp_ist() is still running, the functions may return prematurely
> > with an incorrect return value.
>
> Can this be fixed by changing the sequence of clearing the pending_events in
> pciehp_ist() ?

It can't. The processing logic is such that pciehp_ist() atomically
removes bits from pending_events and acts upon them. Simultaneously, new
events may be queued up by adding bits to pending_events (through a
hardirq handled by pciehp_isr(), through a sysfs request, etc).
Those will be handled in an additional iteration of pciehp_ist().

If I'd delay removing bits from pending_events, I then couldn't tell if
new events have accumulated while others have been processed.
E.g. a PDS event may occur while another one is being processed.
The second PDS events may signify a card removal immediately after
the card has been brought up. It's crucial not to lose the second PDS
event but act properly on it by bringing the slot down again.

This way of processing events also allows me to easily filter events.
E.g. we tolerate link flaps occurring during the first 100 ms after
enabling the slot simply by atomically removing bits from pending_events
at a certain point. See commit 6c35a1ac3da6 ("PCI: pciehp: Tolerate
initially unstable link").

Now what I *could* do would be to make the events currently being
processed public, e.g. by adding an "atomic_t current_events" to
struct controller. Then I could wait in pciehp_sysfs_enable_slot() /
_disable_slot() until both "pending_events" and "current_events"
becomes empty. But it would basically amount to the same as this patch,
and we don't really need to know *which* events are being processed,
only the *fact* that events are being processed.

Let me know if you have further questions regarding the pciehp
processing logic.

Thanks,

Lukas