Re: [PATCH] block: make sure big bio is splitted into at most 256 bvecs
From: Ming Lei
Date: Tue Apr 05 2016 - 21:11:17 EST
On Wed, Apr 6, 2016 at 9:04 AM, Shaohua Li <shli@xxxxxx> wrote:
> On Wed, Apr 06, 2016 at 08:47:56AM +0800, Ming Lei wrote:
>> On Wed, Apr 6, 2016 at 2:27 AM, Shaohua Li <shli@xxxxxx> wrote:
>> > On Wed, Apr 06, 2016 at 01:44:06AM +0800, Ming Lei wrote:
>> >> After arbitrary bio size is supported, the incoming bio may
>> >> be very big. We have to split the bio into small bios so that
>> >> each holds at most BIO_MAX_PAGES bvecs for safety reason, such
>> >> as bio_clone().
>> >>
>> >> This patch fixes the following kernel crash:
>> >>
>> >> > [ 172.660142] BUG: unable to handle kernel NULL pointer dereference at
>> >> > 0000000000000028
>> >> > [ 172.660229] IP: [<ffffffff811e53b4>] bio_trim+0xf/0x2a
>> >> > [ 172.660289] PGD 7faf3e067 PUD 7f9279067 PMD 0
>> >> > [ 172.660399] Oops: 0000 [#1] SMP
>> >> > [...]
>> >> > [ 172.664780] Call Trace:
>> >> > [ 172.664813] [<ffffffffa007f3be>] ? raid1_make_request+0x2e8/0xad7 [raid1]
>> >> > [ 172.664846] [<ffffffff811f07da>] ? blk_queue_split+0x377/0x3d4
>> >> > [ 172.664880] [<ffffffffa005fb5f>] ? md_make_request+0xf6/0x1e9 [md_mod]
>> >> > [ 172.664912] [<ffffffff811eb860>] ? generic_make_request+0xb5/0x155
>> >> > [ 172.664947] [<ffffffffa0445c89>] ? prio_io+0x85/0x95 [bcache]
>> >> > [ 172.664981] [<ffffffffa0448252>] ? register_cache_set+0x355/0x8d0 [bcache]
>> >> > [ 172.665016] [<ffffffffa04497d3>] ? register_bcache+0x1006/0x1174 [bcache]
>> >>
>> >> Fixes: 54efd50(block: make generic_make_request handle arbitrarily sized bios)
>> >
>> > this bug is introduced by d2be537c3ba
>> >> Reported-by: Sebastian Roesner <sroesner-kernelorg@xxxxxxxxxxxxxxxxx>
>> >> Reported-by: Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx>
>> >> Cc: stable@xxxxxxxxxxxxxxx (4.2+)
>> >> Cc: Shaohua Li <shli@xxxxxx>
>> >> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxxxxx>
>> >> ---
>> >> I can reproduce the issue and verify the fix by the following approach:
>> >> - create one raid1 over two virtio-blk
>> >> - build bcache device over the above raid1 and another cache device.
>> >> - set cache mode as writeback
>> >> - run random write over ext4 on the bcache device
>> >> - then the crash can be triggered
>> >
>> > can you explain why this is better than my original patch?
>>
>> Shaohua, your patch is wrong, please see the following link:
>>
>> https://lkml.org/lkml/2016/3/30/893
>
> I don't see why, really, except it declares you are right :)
Never mind, I post it again, and maybe cause my poor english, :-)
blk_rq_get_max_sectors() which uses max sectors limit is used for
merging bios/reqs, and that means limits.max_sectors is for limitting
max sectors in one request or transfer. One request may include lots of
bios. Now this patch decreases the limit just for single bio's 256
bvec's limitation.
Is that correct? That is the reason why I suggest to change get_max_io_size()
for bio's 256 bvecs limit.
On the contrary, the default max sectors should have been increased
since hardware is becoming quicker, and we should send more to drive
in one request, IMO.
>
> why it's 2560 instead of 2048?
I don't know the exact reason why Jeff takes 2560, but I feel it can be
bigger because the hardware is becoming quicker.
Thanks,
Ming
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html