Re: blk: improve order of bio handling in generic_make_request()

From: NeilBrown
Date: Tue Mar 07 2017 - 16:14:27 EST


On Tue, Mar 07 2017, Mike Snitzer wrote:

> On Tue, Mar 07 2017 at 12:05pm -0500,
> Jens Axboe <axboe@xxxxxxxxx> wrote:
>
>> On 03/07/2017 09:52 AM, Mike Snitzer wrote:
>> > On Tue, Mar 07 2017 at 3:49am -0500,
>> > Jack Wang <jinpu.wang@xxxxxxxxxxxxxxxx> wrote:
>> >
>> >>
>> >>
>> >> On 06.03.2017 21:18, Jens Axboe wrote:
>> >>> On 03/05/2017 09:40 PM, NeilBrown wrote:
>> >>>> On Fri, Mar 03 2017, Jack Wang wrote:
>> >>>>>
>> >>>>> Thanks Neil for pushing the fix.
>> >>>>>
>> >>>>> We can optimize generic_make_request a little bit:
>> >>>>> - assign bio_list struct hold directly instead init and merge
>> >>>>> - remove duplicate code
>> >>>>>
>> >>>>> I think better to squash into your fix.
>> >>>>
>> >>>> Hi Jack,
>> >>>> I don't object to your changes, but I'd like to see a response from
>> >>>> Jens first.
>> >>>> My preference would be to get the original patch in, then other changes
>> >>>> that build on it, such as this one, can be added. Until the core
>> >>>> changes lands, any other work is pointless.
>> >>>>
>> >>>> Of course if Jens wants a this merged before he'll apply it, I'll
>> >>>> happily do that.
>> >>>
>> >>> I like the change, and thanks for tackling this. It's been a pending
>> >>> issue for way too long. I do think we should squash Jack's patch
>> >>> into the original, as it does clean up the code nicely.
>> >>>
>> >>> Do we have a proper test case for this, so we can verify that it
>> >>> does indeed also work in practice?
>> >>>
>> >> Hi Jens,
>> >>
>> >> I can trigger deadlock with in RAID1 with test below:
>> >>
>> >> I create one md with one local loop device and one remote scsi
>> >> exported by SRP. running fio with mix rw on top of md, force_close
>> >> session on storage side. mdx_raid1 is wait on free_array in D state,
>> >> and a lot of fio also in D state in wait_barrier.
>> >>
>> >> With the patch from Neil above, I can no longer trigger it anymore.
>> >>
>> >> The discussion was in link below:
>> >> http://www.spinics.net/lists/raid/msg54680.html
>> >
>> > In addition to Jack's MD raid test there is a DM snapshot deadlock test,
>> > albeit unpolished/needy to get running, see:
>> > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html
>>
>> Can you run this patch with that test, reverting your DM workaround?
>
> Yeap, will do. Last time Mikulas tried a similar patch it still
> deadlocked. But I'll give it a go (likely tomorrow).

I don't think this will fix the DM snapshot deadlock by itself.
Rather, it make it possible for some internal changes to DM to fix it.
The DM change might be something vaguely like:

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3086da5664f3..06ee0960e415 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1216,6 +1216,14 @@ static int __split_and_process_non_flush(struct clone_info *ci)

len = min_t(sector_t, max_io_len(ci->sector, ti), ci->sector_count);

+ if (len < ci->sector_count) {
+ struct bio *split = bio_split(bio, len, GFP_NOIO, fs_bio_set);
+ bio_chain(split, bio);
+ generic_make_request(bio);
+ bio = split;
+ ci->sector_count = len;
+ }
+
r = __clone_and_map_data_bio(ci, ti, ci->sector, &len);
if (r < 0)
return r;

Instead of looping inside DM, this change causes the remainder to be
passed to generic_make_request() and DM only handles or region at a
time. So there is only one loop, in the top generic_make_request().
That loop will not reliable handle bios in the "right" order.

Thanks,
NeilBrown

Attachment: signature.asc
Description: PGP signature