Re: [PATCH v3] block: make sure big bio is splitted into at most 256 bvecs

From: Kent Overstreet
Date: Mon Aug 15 2016 - 15:13:09 EST


On Mon, Aug 15, 2016 at 11:23:28AM -0700, Christoph Hellwig wrote:
> On Mon, Aug 15, 2016 at 11:11:22PM +0800, Ming Lei wrote:
> > After arbitrary bio size is supported, the incoming bio may
> > be very big. We have to split the bio into small bios so that
> > each holds at most BIO_MAX_PAGES bvecs for safety reason, such
> > as bio_clone().
>
> I still think working around a rough driver submitting too large
> I/O is a bad thing until we've done a full audit of all consuming
> bios through ->make_request, and we've enabled it for the common
> path as well.

bcache originally had workaround code to split too-large bios when it first went
upstream - that was dropped only after the patches to make
generic_make_request() handle arbitrary size bios went in. So to do what you're
suggesting would mean reverting that bcache patch and bringing that code back,
which from my perspective would be a step in the wrong direction. I just want to
get this over and done with.

re: interactions with other drivers - bio_clone() has already been changed to
only clone biovecs that are live for current bi_iter, so there shouldn't be any
safety issues. A driver would have to be intentionally doing its own open coded
bio cloning that clones all of bi_io_vec, not just the active ones - but if
they're doing that, they're already broken because a driver isn't allowed to
look at bi_vcnt if it isn't a bio that it owns - bi_vcnt is 0 on bios that don't
own their biovec (i.e. that were created by bio_clone_fast).

And the cloning and bi_vcnt usage stuff I audited very thoroughly back when I
was working on immutable biovecs and such back in the day, and I had to do a
fair amount of cleanup/refactoring before that stuff could go in.

>
> > bool do_split = true;
> > struct bio *new = NULL;
> > const unsigned max_sectors = get_max_io_size(q, bio);
> > + unsigned bvecs = 0;
> > +
> > + *no_merge = true;
> >
> > bio_for_each_segment(bv, bio, iter) {
> > /*
> > + * With arbitrary bio size, the incoming bio may be very
> > + * big. We have to split the bio into small bios so that
> > + * each holds at most BIO_MAX_PAGES bvecs because
> > + * bio_clone() can fail to allocate big bvecs.
> > + *
> > + * It should have been better to apply the limit per
> > + * request queue in which bio_clone() is involved,
> > + * instead of globally. The biggest blocker is
> > + * bio_clone() in bio bounce.
> > + *
> > + * If bio is splitted by this reason, we should allow
> > + * to continue bios merging.
> > + *
> > + * TODO: deal with bio bounce's bio_clone() gracefully
> > + * and convert the global limit into per-queue limit.
> > + */
> > + if (bvecs++ >= BIO_MAX_PAGES) {
> > + *no_merge = false;
> > + goto split;
> > + }
>
> That being said this simple if check here is simple enough that it's
> probably fine. But I see no need to uglify the whole code path
> with that no_merge flag. Please drop if for now, and if we start
> caring for this path in common code we should just move the
> REQ_NOMERGE setting into the actual blk_bio_*_split helpers.

Agreed about the no_merge thing.