Re: blk: improve order of bio handling in generic_make_request()

From: Lars Ellenberg
Date: Wed Mar 08 2017 - 12:32:23 EST


On 8 March 2017 at 17:40, Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:
>
> On Wed, 8 Mar 2017, NeilBrown wrote:
> > I don't think this will fix the DM snapshot deadlock by itself.
> > Rather, it make it possible for some internal changes to DM to fix it.
> > The DM change might be something vaguely like:
> >
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index 3086da5664f3..06ee0960e415 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -1216,6 +1216,14 @@ static int __split_and_process_non_flush(struct clone_info *ci)
> >
> > len = min_t(sector_t, max_io_len(ci->sector, ti), ci->sector_count);
> >
> > + if (len < ci->sector_count) {
> > + struct bio *split = bio_split(bio, len, GFP_NOIO, fs_bio_set);
>
> fs_bio_set is a shared bio set, so it is prone to deadlocks. For this
> change, we would need two bio sets per dm device, one for the split bio
> and one for the outgoing bio. (this also means having one more kernel
> thread per dm device)
>
> It would be possible to avoid having two bio sets if the incoming bio were
> the same as the outgoing bio (allocate a small structure, move bi_end_io
> and bi_private into it, replace bi_end_io and bi_private with pointers to
> device mapper and send the bio to the target driver), but it would need
> much more changes - basically rewrite the whole bio handling code in dm.c
> and in the targets.
>
> Mikulas

"back then" (see previously posted link into ML archive)
I suggested this:

...

A bit of conflict here may be that DM has all its own
split and clone and queue magic, and wants to process
"all of the bio" before returning back to generic_make_request().

To change that, __split_and_process_bio() and all its helpers
would need to learn to "push back" (pieces of) the bio they are
currently working on, and not push back via "DM_ENDIO_REQUEUE",
but by bio_list_add_head(&current->bio_lists->queue, piece_to_be_done_later).

Then, after they processed each piece,
*return* all the way up to the top-level generic_make_request(),
where the recursion-to-iteration logic would then
make sure that all deeper level bios, submitted via
recursive calls to generic_make_request() will be processed, before the
next, pushed back, piece of the "original incoming" bio.

And *not* do their own iteration over all pieces first.

Probably not as easy as dropping the while loop,
using bio_advance, and pushing that "advanced" bio back to
current->...queue?

static void __split_and_process_bio(struct mapped_device *md,
struct dm_table *map, struct bio *bio)
...
ci.bio = bio;
ci.sector_count = bio_sectors(bio);
while (ci.sector_count && !error)
error = __split_and_process_non_flush(&ci);
...
error = __split_and_process_non_flush(&ci);
if (ci.sector_count)
bio_advance()
bio_list_add_head(&current->bio_lists->queue, )
...

Something like that, maybe?


Needs to be adapted to this new and improved recursion-to-iteration
logic, obviously. Would that be doable, or does device-mapper for some
reason really *need* its own iteration loop (which, because it is called
from generic_make_request(), won't be able to ever submit anything to
any device, ever, so needs all these helper threads just in case).

Lars