Re: [PATCH 12/13] Make generic_make_request handle arbitrarily largebios

From: Kent Overstreet
Date: Fri May 18 2012 - 04:15:45 EST


On Fri, May 18, 2012 at 06:05:50PM +1000, NeilBrown wrote:
>
> Hi Kent,
> there is lots of good stuff in this series and I certainly hope a lot of it
> can eventually go upstream.
>
> However there is a part of this patch that I don't think is safe:
>
>
> > +static void __bio_submit_split(struct closure *cl)
> > +{
> > + struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
> > + struct bio *bio = s->bio, *n;
> > +
> > + do {
> > + n = bio_split(bio, bio_max_sectors(bio),
> > + GFP_NOIO, s->q->bio_split);
> > + if (!n)
> > + continue_at(cl, __bio_submit_split, bio_split_wq);
> > +
> > + closure_get(cl);
> > + generic_make_request(n);
> > + } while (n != bio);
> > +
> > + continue_at(cl, bio_submit_split_done, NULL);
> > +}
>
> Firstly a small point: Can bio_split ever return NULL here? I don't
> think it can, so there is no need to test.
> But if it can, then calling generic_make_request(NULL) doesn't seem like a
> good idea.
>
> More significantly though::
> This is called from generic_make_request which can be called recursively and
> enforces a tail-recursion semantic.
> If that generic_make_request was a recursive call, then the
> generic_make_request in __bio_submit_split will not start the request, but
> will queue the bio for later handling. If we then call bio_split again, we
> could try to allocation from a mempool while we are holding one entry
> allocated from that pool captive. This can deadlock.
>
> i.e. if the original bio is so large that it needs to be split into 3 pieces,
> then we will try to allocate the second piece before the first piece has a
> chance to be released. If this happens in enough threads to exhaust the pool
> (4 I think), it will deadlock.
>
> I realise this sounds like a very unlikely case, but of course they happen.
>
> One possible approach might be to count how many splits will be required,
> then have an interface to mempools so you can allocate them all at once,
> or block and wait.

Yeah, I actually thought of that (I probably should've documented it,
though).

That's why the code is checking if bio_split() returned NULL; my verison
of bio_split() checks if current->bio_list is non NULL, and if it is it
doesn't pass __GFP_WAIT to bio_alloc_bioset().

(I was debating whether bio_split() should do that itself or leave it up
to the caller. I tend to think it's too easy to accidentally screw up -
and not notice it - so it should be enforced by generic code. Possibly
the check should be moved to bio_alloc_bioset(), though.)

If we do get a memory allocation failure, then we just punt to workqueue
- continue_at() runs __bio_submit_split from the bio_split workqueue -
where we can safely use the mempool.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/