Re: [PATCH v8] bio: limit bio max size
From: Damien Le Moal
Date: Fri Apr 23 2021 - 07:36:51 EST
On 2021/04/21 19:05, Changheun Lee wrote:
> bio size can grow up to 4GB when muli-page bvec is enabled.
> but sometimes it would lead to inefficient behaviors.
> in case of large chunk direct I/O, - 32MB chunk read in user space -
> all pages for 32MB would be merged to a bio structure if the pages
> physical addresses are contiguous. it makes some delay to submit
> until merge complete. bio max size should be limited to a proper size.
And what is the "proper size" ? You should be specific here and mention
max_sectors or max_hw_sectors.
>
> When 32MB chunk read with direct I/O option is coming from userspace,
> kernel behavior is below now in do_direct_IO() loop. it's timeline.
>
> | bio merge for 32MB. total 8,192 pages are merged.
> | total elapsed time is over 2ms.
> |------------------ ... ----------------------->|
> | 8,192 pages merged a bio.
> | at this time, first bio submit is done.
> | 1 bio is split to 32 read request and issue.
> |--------------->
> |--------------->
> |--------------->
> ......
> |--------------->
> |--------------->|
> total 19ms elapsed to complete 32MB read done from device. |
>
> If bio max size is limited with 1MB, behavior is changed below.
>
> | bio merge for 1MB. 256 pages are merged for each bio.
> | total 32 bio will be made.
> | total elapsed time is over 2ms. it's same.
> | but, first bio submit timing is fast. about 100us.
> |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
> | 256 pages merged a bio.
> | at this time, first bio submit is done.
> | and 1 read request is issued for 1 bio.
> |--------------->
> |--------------->
> |--------------->
> ......
> |--------------->
> |--------------->|
> total 17ms elapsed to complete 32MB read done from device. |
>
> As a result, read request issue timing is faster if bio max size is limited.
> Current kernel behavior with multipage bvec, super large bio can be created.
> And it lead to delay first I/O request issue.
The above message describes the problem very well, but there is no explanation
of the solution implemented. So one cannot check if the implementation matches
the intent. Please add a description of the solution, more detailed than just
saying "limit bio size".
>
> Signed-off-by: Changheun Lee <nanich.lee@xxxxxxxxxxx>
> ---
> block/bio.c | 9 ++++++++-
> block/blk-settings.c | 5 +++++
> include/linux/bio.h | 4 +++-
> include/linux/blkdev.h | 2 ++
> 4 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index 50e579088aca..9e5061ecc317 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -255,6 +255,13 @@ void bio_init(struct bio *bio, struct bio_vec *table,
> }
> EXPORT_SYMBOL(bio_init);
>
> +unsigned int bio_max_size(struct bio *bio)
> +{
> + struct request_queue *q = bio->bi_bdev->bd_disk->queue;
> +
> + return q->limits.bio_max_bytes;
> +}
> +
> /**
> * bio_reset - reinitialize a bio
> * @bio: bio to reset
> @@ -866,7 +873,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
> struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
>
> if (page_is_mergeable(bv, page, len, off, same_page)) {
> - if (bio->bi_iter.bi_size > UINT_MAX - len) {
> + if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
> *same_page = false;
> return false;
> }
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index b4aa2f37fab6..cd3dcb5afe50 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -37,6 +37,7 @@ EXPORT_SYMBOL_GPL(blk_queue_rq_timeout);
> */
> void blk_set_default_limits(struct queue_limits *lim)
> {
> + lim->bio_max_bytes = UINT_MAX;
> lim->max_segments = BLK_MAX_SEGMENTS;
> lim->max_discard_segments = 1;
> lim->max_integrity_segments = 0;
> @@ -168,6 +169,10 @@ void blk_queue_max_hw_sectors(struct request_queue *q, unsigned int max_hw_secto
> limits->logical_block_size >> SECTOR_SHIFT);
> limits->max_sectors = max_sectors;
>
> + if (check_shl_overflow(max_sectors, SECTOR_SHIFT,
> + &limits->bio_max_bytes))
> + limits->bio_max_bytes = UINT_MAX;
limits->bio_max_bytes = min_t(sector_t,
(sector_t)max_sectors << SECTOR_SHIFT, UINT_MAX);
is easier to understand in my opinion.
> +
> q->backing_dev_info->io_pages = max_sectors >> (PAGE_SHIFT - 9);
> }
> EXPORT_SYMBOL(blk_queue_max_hw_sectors);
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index d0246c92a6e8..e5add63da3af 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -106,6 +106,8 @@ static inline void *bio_data(struct bio *bio)
> return NULL;
> }
>
> +extern unsigned int bio_max_size(struct bio *bio);
No need for extern.
> +
> /**
> * bio_full - check if the bio is full
> * @bio: bio to check
> @@ -119,7 +121,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
> if (bio->bi_vcnt >= bio->bi_max_vecs)
> return true;
>
> - if (bio->bi_iter.bi_size > UINT_MAX - len)
> + if (bio->bi_iter.bi_size > bio_max_size(bio) - len)
> return true;
>
> return false;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 158aefae1030..c205d60ac611 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -312,6 +312,8 @@ enum blk_zoned_model {
> };
>
> struct queue_limits {
> + unsigned int bio_max_bytes;
Please move this below in the structure together with all other fields that are
unsigned int too.
> +
> unsigned long bounce_pfn;
> unsigned long seg_boundary_mask;
> unsigned long virt_boundary_mask;
>
As far as I can tell, bio_max_bytes is UINT_MAX by default, becomes equal to
max_hw_sectors when that limit is set by a driver and cannot take any other
value. So why introduce this new limit at all ? Why not use max_hw_sectors
directly as a bio size limit ?
--
Damien Le Moal
Western Digital Research