Re: [PATCH 3/3] pciehp: Fix race condition handling surprise link-down

From: Keith Busch
Date: Thu Dec 08 2016 - 12:11:45 EST


On Thu, Dec 08, 2016 at 09:11:58AM -0600, Bjorn Helgaas wrote:
> On Wed, Dec 07, 2016 at 07:04:33PM -0500, Keith Busch wrote:
> >
> > It currently looks safe to nest the p_slot->lock under the
> > p_slot->hotplug_lock if that is you recommendation.
>
> I'm not making a recommendation; that would take a lot more thought
> than I've given this.
>
> There are at least three locks in pciehp:
>
> struct controller.ctrl_lock
> struct slot.lock
> struct slot.hotplug_lock
>
> There shouldn't really be any performance paths in pciehp, so I'm
> pretty doubtful that we need such complicated locking.

They are protecting different things, but I agree it looks like room
for simplification exists.

> > Alternatively we could fix this if we used an ordered work queue for
> > the slot's work, but that is a significantly more complex change.
>
> You mean we can currently execute things from the work queue in a
> different order than they were enqueued? That sounds ... difficult to
> analyze, to say the least.

The events are dequeued in order, but they don't wait for the previous
to complete, so pciehp's current work queue can have multiple events
executing in parallel. That's part of why rapid pciehp slot events are
a little more difficult to follow, and I think we may even be unsafely
relying on the order the mutexes are obtained from these work events.

Partly unrelated, we could process surprise removals significantly
faster (microseconds vs. seconds), with the limited pci access series
here, giving fewer simultaneously executing events to consider:

https://www.spinics.net/lists/linux-pci/msg55585.html

Do you have any concerns with that series?

> I don't know much about work queues, and Documentation/workqueue.txt
> doesn't mention alloc_ordered_workqueue(). Is that what you are
> referring to?

Yes, the alloc_ordered_workqueue is what I had in mind, though switching
to that is not as simple as calling the different API. I am looking into
that for longer term, but for the incremental fix, do you think we can
go forward with Raj's proposal?