Re: [PATCH 3/3] pciehp: Fix race condition handling surprise link-down
From: Bjorn Helgaas
Date: Thu Dec 08 2016 - 10:12:09 EST
On Wed, Dec 07, 2016 at 07:04:33PM -0500, Keith Busch wrote:
> On Wed, Dec 07, 2016 at 05:40:54PM -0600, Bjorn Helgaas wrote:
> > On Sat, Nov 19, 2016 at 12:32:47AM -0800, Ashok Raj wrote:
> > > --- a/drivers/pci/hotplug/pciehp_ctrl.c
> > > +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> > > @@ -182,6 +182,7 @@ static void pciehp_power_thread(struct work_struct *work)
> > > switch (info->req) {
> > > case DISABLE_REQ:
> > > mutex_lock(&p_slot->hotplug_lock);
> > > + p_slot->state = POWEROFF_STATE;
> >
> > It sounds right that p_slot->state should be protected.
> >
> > It looks like handle_button_press_event() and
> > pciehp_sysfs_enable_slot() hold p_slot->lock while updating
> > p_slot->state.
> >
> > You're setting "state = POWEROFF_STATE" while holding
> > p_slot->hotplug_lock (not p_slot->lock). Four lines down, we set
> > "state = STATIC_STATE", but this time we're holding p_slot->lock.
> >
> > What is the difference between the p_slot->lock and
> > p_slot->hotplug_lock? Do we need both? How do we know which one to
> > use?
> >
> > I'm very confused.
>
> This is _very_ confusing. :)
>
> The p_slot->hotplug_lock serializes a power off and a power on event. We
> only want to set the state when one of those events gets to execute
> rather than when the event is queued. Changing the state when the event
> is queued can trigger a never ending up-down sequence.
>
> It currently looks safe to nest the p_slot->lock under the
> p_slot->hotplug_lock if that is you recommendation.
I'm not making a recommendation; that would take a lot more thought
than I've given this.
There are at least three locks in pciehp:
struct controller.ctrl_lock
struct slot.lock
struct slot.hotplug_lock
There shouldn't really be any performance paths in pciehp, so I'm
pretty doubtful that we need such complicated locking.
> Alternatively we could fix this if we used an ordered work queue for
> the slot's work, but that is a significantly more complex change.
You mean we can currently execute things from the work queue in a
different order than they were enqueued? That sounds ... difficult to
analyze, to say the least.
I don't know much about work queues, and Documentation/workqueue.txt
doesn't mention alloc_ordered_workqueue(). Is that what you are
referring to?
Bjorn