Re: [BUG] Oops when SCSI device under multipath is removed

From: Alan Stern
Date: Thu Aug 11 2011 - 11:16:24 EST


On Thu, 11 Aug 2011, James Bottomley wrote:

> > > Well, it's just hiding the problem. The essential problem is that only
> > > block has the correctly refcounted knowledge to know the last release of
> > > the queue reference. Until that time, the holder of the reference can
> > > use the queue regardless of whether blk_cleanup_queue() has been called.
> > > This is the race you complain about since use of the queue involves the
> > > lock which should be guarded by QUEUE_DEAD checks.
> > >
> > > This is essentially unfixable with function calls. The only way to fix
> > > it is to have a callback model for freeing the external lock.
> >
> > Assuming the queue is associated with a device, the queue could take a
> > reference to the device, dropping that reference when the queue is
> > freed. Then the lock could safely be freed at the same time as the
> > device.
>
> If that assumption is correct, there's no point refcounting the queue at
> all because its use is entirely subordinated to the lifecycle of the
> associated device.

That's true. Why wasn't it done that way originally? Are there queues
that aren't associated with devices?

> Plus all the wittering about my previous patch is
> pointless, because blk_cleanup_queue() has to do the final put of the
> queue in the lock free path (otherwise the assumption is violated).
>
> However, much as I'd like to accept this rosy view, the original oops
> that started all of this in 2.6.38 was someone caught something with a
> reference to a SCSI queue after the device release function had been
> called.

Not according to your commit log. You wrote that the reference was
taken after scsi_remove_device() had been called -- but the device
release function is scsi_device_dev_release_usercontext().

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/