Re: [PATCH] block: copy bi_vcnt in __bio_clone_fast

From: Ming Lei
Date: Thu Oct 09 2014 - 21:24:59 EST


On Fri, Oct 10, 2014 at 3:12 AM, Jens Axboe <axboe@xxxxxxxxx> wrote:
> On 10/09/2014 11:58 AM, Jeff Mahoney wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 10/9/14, 12:13 PM, Ming Lei wrote:
>>> On Thu, Oct 9, 2014 at 11:25 PM, Ming Lei <ming.lei@xxxxxxxxxxxxx>
>>> wrote:
>>>> On Thu, Oct 9, 2014 at 10:26 PM, Jeff Mahoney <jeffm@xxxxxxxx>
>>>> wrote:
>>>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>>>>
>>>>> On 10/9/14, 9:53 AM, Jeff Moyer wrote:
>>>>>> Jeff Mahoney <jeffm@xxxxxxxx> writes:
>>>>>>
>>>>>>> Commit 05f1dd53152173 (block: add queue flag for disabling
>>>>>>> SG merging) uses bi_vcnt to assign bio->bi_phys_segments if
>>>>>>> sg merging is disabled. When using device mapper on top of
>>>>>>> a blk-mq device (virtio_blk in my test), we'd end up
>>>>>>> overflowing the scatterlist in __blk_bios_map_sg.
>>>>>>>
>>>>>>> __bio_clone_fast copies bi_iter and bi_io_vec but not
>>>>>>> bi_vcnt, so blk_recount_segments would report
>>>>>>> bi_phys_segments as 0. Since rq->nr_phys_segments is 0 as
>>>>>>> well, the checks to ensure that we don't exceed the queue's
>>>>>>> segment limit end up allowing more bios (and segments) to
>>>>>>> attach the a request until we finally map it. That also
>>>>>>> means we pass the BUG_ON at the beginning of
>>>>>>> virtio_queue_rq, ultimately causing memory corruption and
>>>>>>> a crash.
>>>>>>>
>>>>>>> If we copy bi_vcnt in __bio_clone_fast, the bios and
>>>>>>> requests properly report the number of segments and
>>>>>>> everything works as expected.
>>>>>>>
>>>>>>> Originally reported at
>>>>>>> http://bugzilla.opensuse.org/show_bug.cgi?id=888259
>>>>>>
>>>>>> Hi, Jeff,
>>>>>>
>>>>>> Did you manage to reproduce this problem with commit 0738854
>>>>>> (blk-merge: fix blk_recount_segments) applied? Or perhaps
>>>>>> with commit 200612e (dm table: propagate
>>>>>> QUEUE_FLAG_NO_SG_MERGE)?
>>>>>
>>>>> Yep. I was able to reproduce it with 3.17. I did try 0738854
>>>>> when I was still using 3.16 since it looked like a good
>>>>> candidate. Neither of those patches affect the problem here.
>>>>> bio->bi_phys_segments never gets a value set in the fast clone
>>>>> case and that translates to req->nr_phys_segments never getting
>>>>> properly accumulated. That might not be a problem except that
>>>>> the NO_SG_MERGE behavior bypasses the iteration that would come
>>>>> up with the correct value. In either case,
>>>>
>>>> This patch may get incorrect rq->nr_phys_segments because bio
>>>> cloning is often used in case of I/O splitting, so could you test
>>>> if the attached patch fixes your problem?
>>
>> Ah. Right.
>>
>>> Please ignore last patch and test the attached patch since we still
>>> should use no sg merge to recalculate req's segments for cloned
>>> bio.
>>
>> Yep, this fix works as expected.
>
> Thanks Ming, I've queued it up (and added Jeff's tested-by).

Sorry, looks my patch is still wrong:

- merge should be done if bio->bi_vcnt is bigger than max segment
- if one bio has been cloned from, bi_vcnt can't be relied on too

So that means it may be incorrect to use bio->bi_vcnt in
blk_recalc_rq_segments(), but seems like using bio_segments()
is a bit expensive for NO_SG_MERGE.

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/