Re: [PATCH 07/11] blkcg: make request_queue bypassing on allocation

From: James Bottomley
Date: Tue Apr 17 2012 - 08:05:15 EST


On Fri, 2012-04-13 at 14:16 -0700, Tejun Heo wrote:
> On Fri, Apr 13, 2012 at 02:05:48PM -0700, Tejun Heo wrote:
> > On Fri, Apr 13, 2012 at 04:55:01PM -0400, Vivek Goyal wrote:
> > > But neither seems to be the case here. So to make sure that blkg_lookup()
> > > under rcu will see the updated value of queue flag (bypass), are we
> > > relying on the fact that caller should see the DEAD flag and not go
> > > ahead with blkg_lookup()? If yes, atleast it is not obivious.
> >
> > We're relying on the fact that it doesn't matter anymore because all
> > blkgs will be shoot down in queue cleanup path which goes through rcu
> > free, which is different from deactivating individual policies. It
> > indeed is subtle. Umm... this is starting to get ridiculous. Why the
> > hell was megaraid messing with so many queues anyways?
>
> I suppose megaraid depends on sequential LUN scan which SCSI
> implements by creating sdev for each LUN, trying to see whether it
> actually exists and then destroys the sdev if not. Urgh.... so, we
> seem to be stuck with it.

Right, sorry ... it's not just megaraid, it's any SCSI-2 device. The
standard says we have to probe the LUNs one at a time to see if they're
there. SCSI-3 on supports the REPORT LUNS command which just returns a
list which obviates the need to probe on every one but not all older
(and USB to be frank) devices support this.

> So, the current code is technically correct although subtle like hell.
> We can RCU defer blk_put_queue() from blk_cleanup_queue() using
> call_rcu() to make clear that RCU grace period is necessary there.
> Any better ideas?

Not really ... except that perhaps we might redo LUN scanning to use
just a single queue, so repurpose the LUN underneath, but not destroy
the old queue and setup the new one? It's a bit counter intuitive, but
it shouldn't be impossible.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/