RE: System reboot hangs due to race against devices_kset->listtriggered by SCSI FC workqueue

From: Hugh Daschbach
Date: Thu Mar 04 2010 - 14:09:18 EST


Alan Stern [mailto:stern@xxxxxxxxxxxxxxxxxxx] writes:

> On Wed, 3 Mar 2010, Hugh Daschbach wrote:
>
>> Alan Stern [mailto:stern@xxxxxxxxxxxxxxxxxxx] writes:
>>
>> > On Wed, 3 Mar 2010, Hugh Daschbach wrote:
>> >
>> >> > Can't we just protect the list? What is wanting to write to the list
>> >> > while shutdown is happening?
>> >>
>> >> Indeed, Alan suggested holding the kset spinlock while iterating the
...
>> > What I meant was that you should hold the spinlock while finding and
>> > unlinking the last device on the list. Clearly you shouldn't hold it
>> > while calling the device shutdown routine.
>>
>> I misunderstood. But I believe insertion and deletion is properly
>> serliaized. It looks to me like the list structure is intact. It's the
>> iterator that's been driven off into the weeds.
...
>> Just to be clear, the list we're talking about is "list" in "struct
>> kset" And the nodes of the list are chained by "entry" in "struct
>> kobject".
...
>> At a minimum the change looks something like the patch below.
...
> If you really want to do this then you should remove the lock member
> from struct kset. However this seems like an awful lot of work
> compared to my original suggestion -- something like this (untested,
> and you'll want to add comments):
...

I'm not sure I do want to pursue this. It does seem particularly
invasive at a fundamental level of a core data structure.

Apparently I still don't understand your original suggestion. I'd
prefer to, especially if it leads to a simpler fix. The loop in
device_shutdown() looks something like:

struct device *dev, *devn;

list_for_each_entry_safe_reverse(dev, devn, &devices_kset->list,
kobj.entry) {
if (dev->bus && dev->bus->shutdown) {
dev->bus->shutdown(dev);
} else if (dev->driver && dev->driver->shutdown) {
dev->driver->shutdown(dev);
}
}

*dev gets delinked kobj_kset_leave() indirectly called from
dev->*->shutdown(dev). This is protected by the spinlock.

The secondary thread similarly calls kobj_kset_leave(). But when the
secondary thread calls the shutdown routine for the device that devn
points to, the loop hangs.

Is there some way I can detect that devn no longer points to a valid
device upon return from dev->*->shutdown(dev)? Or, where else can I
look to better understand your suggestion?

Thanks,
Hugh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/