Re: [PATCH v1 00/54] block: support multipage bvec

From: Ming Lei
Date: Sun Jan 15 2017 - 22:19:37 EST


Hi Guys,

On Tue, Dec 27, 2016 at 11:55 PM, Ming Lei <tom.leiming@xxxxxxxxx> wrote:
> Hi,
>
> This patchset brings multipage bvec into block layer. Basic
> xfstests(-a auto) over virtio-blk/virtio-scsi have been run
> and no regression is found, so it should be good enough
> to show the approach now, and any comments are welcome!
>
> 1) what is multipage bvec?
>
> Multipage bvecs means that one 'struct bio_bvec' can hold
> multiple pages which are physically contiguous instead
> of one single page used in linux kernel for long time.
>
> 2) why is multipage bvec introduced?
>
> Kent proposed the idea[1] first.
>
> As system's RAM becomes much bigger than before, and
> at the same time huge page, transparent huge page and
> memory compaction are widely used, it is a bit easy now
> to see physically contiguous pages from fs in I/O.
> On the other hand, from block layer's view, it isn't
> necessary to store intermediate pages into bvec, and
> it is enough to just store the physicallly contiguous
> 'segment'.
>
> Also huge pages are being brought to filesystem[2], we
> can do IO a hugepage a time[3], requires that one bio can
> transfer at least one huge page one time. Turns out it isn't
> flexiable to change BIO_MAX_PAGES simply[3]. Multipage bvec
> can fit in this case very well.
>
> With multipage bvec:
>
> - bio size can be increased and it should improve some
> high-bandwidth IO case in theory[4].
>
> - Inside block layer, both bio splitting and sg map can
> become more efficient than before by just traversing the
> physically contiguous 'segment' instead of each page.
>
> - there is possibility in future to improve memory footprint
> of bvecs usage.
>
> 3) how is multipage bvec implemented in this patchset?
>
> The 1st 9 patches comment on some special cases. As we saw,
> most of cases are found as safe for multipage bvec,
> only fs/buffer, MD and btrfs need to deal with. Both fs/buffer
> and btrfs are dealt with in the following patches based on some
> new block APIs for multipage bvec.
>
> Given a little more work is involved to cleanup MD, this patchset
> introduces QUEUE_FLAG_NO_MP for them, and this component can still
> see/use singlepage bvec. In the future, once the cleanup is done, the
> flag can be killed.
>
> The 2nd part(23 ~ 54) implements multipage bvec in block:
>
> - put all tricks into bvec/bio/rq iterators, and as far as
> drivers and fs use these standard iterators, they are happy
> with multipage bvec
>
> - bio_for_each_segment_all() changes
> this helper pass pointer of each bvec directly to user, and
> it has to be changed. Two new helpers(bio_for_each_segment_all_sp()
> and bio_for_each_segment_all_mp()) are introduced.
>
> Also convert current bio_for_each_segment_all() into the
> above two.
>
> - bio_clone() changes
> At default bio_clone still clones one new bio in multipage bvec
> way. Also single page version of bio_clone() is introduced
> for some special cases, such as only single page bvec is used
> for the new cloned bio(bio bounce, ...)
>
> - btrfs cleanup
> just three patches for avoiding direct access to bvec table.
>
> These patches can be found in the following git tree:
>
> https://github.com/ming1/linux/commits/mp-bvec-0.6-v4.10-rc
>
> Thanks Christoph for looking at the early version and providing
> very good suggestions, such as: introduce bio_init_with_vec_table(),
> remove another unnecessary helpers for cleanup and so on.
>
> TODO:
> - cleanup direct access to bvec table for MD
>
> V1:
> - against v4.10-rc1 and some cleanup in V0 are in -linus already
> - handle queue_virt_boundary() in mp bvec change and make NVMe happy
> - further BTRFS cleanup
> - remove QUEUE_FLAG_SPLIT_MP
> - rename for two new helpers of bio_for_each_segment_all()
> - fix bounce convertion
> - address comments in V0

Any comments on this version?

BTW, with one fix in the following link:

https://github.com/ming1/linux/commit/e52897a21b4b4c1500cc3686b8392757ebc5bd19

xfstests(ext4, xfs and btrfs) were run and no regression is observed.

Also one new patch is introduced to cover dio over block device:

https://github.com/ming1/linux/commit/58a0f7a7f6afa74cc29d453f9b5d79304c90aa09

Thanks,
Ming

>
> [1], http://marc.info/?l=linux-kernel&m=141680246629547&w=2
> [2], https://patchwork.kernel.org/patch/9451523/
> [3], http://marc.info/?t=147735447100001&r=1&w=2
> [4], http://marc.info/?l=linux-mm&m=147745525801433&w=2
>
>
> Ming Lei (54):
> block: drbd: comment on direct access bvec table
> block: loop: comment on direct access to bvec table
> kernel/power/swap.c: comment on direct access to bvec table
> mm: page_io.c: comment on direct access to bvec table
> fs/buffer: comment on direct access to bvec table
> f2fs: f2fs_read_end_io: comment on direct access to bvec table
> bcache: comment on direct access to bvec table
> block: comment on bio_alloc_pages()
> block: comment on bio_iov_iter_get_pages()
> block: introduce flag QUEUE_FLAG_NO_MP
> md: set NO_MP for request queue of md
> dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE
> block: comments on bio_for_each_segment[_all]
> block: introduce multipage/single page bvec helpers
> block: implement sp version of bvec iterator helpers
> block: introduce bio_for_each_segment_mp()
> block: introduce bio_clone_sp()
> bvec_iter: introduce BVEC_ITER_ALL_INIT
> block: bounce: avoid direct access to bvec table
> block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq
> block: introduce bio_can_convert_to_sp()
> block: bounce: convert multipage bvecs into singlepage
> bcache: handle bio_clone() & bvec updating for multipage bvecs
> blk-merge: compute bio->bi_seg_front_size efficiently
> block: blk-merge: try to make front segments in full size
> block: blk-merge: remove unnecessary check
> block: use bio_for_each_segment_mp() to compute segments count
> block: use bio_for_each_segment_mp() to map sg
> block: introduce bvec_for_each_sp_bvec()
> block: bio: introduce single/multi page version of
> bio_for_each_segment_all()
> block: introduce bio_segments_all()
> block: introduce bvec_get_last_sp()
> block: deal with dirtying pages for multipage bvec
> block: convert to singe/multi page version of
> bio_for_each_segment_all()
> bcache: convert to bio_for_each_segment_all_sp()
> dm-crypt: don't clear bvec->bv_page in crypt_free_buffer_pages()
> dm-crypt: convert to bio_for_each_segment_all_sp()
> md/raid1.c: convert to bio_for_each_segment_all_sp()
> fs/mpage: convert to bio_for_each_segment_all_sp()
> fs/direct-io: convert to bio_for_each_segment_all_sp()
> ext4: convert to bio_for_each_segment_all_sp()
> xfs: convert to bio_for_each_segment_all_sp()
> gfs2: convert to bio_for_each_segment_all_sp()
> f2fs: convert to bio_for_each_segment_all_sp()
> exofs: convert to bio_for_each_segment_all_sp()
> fs: crypto: convert to bio_for_each_segment_all_sp()
> fs/btrfs: convert to bio_for_each_segment_all_sp()
> fs/block_dev.c: convert to bio_for_each_segment_all_sp()
> fs/iomap.c: convert to bio_for_each_segment_all_sp()
> fs/buffer.c: use bvec iterator to truncate the bio
> btrfs: avoid access to .bi_vcnt directly
> btrfs: use bvec_get_last_sp to get the last singlepage bvec
> btrfs: comment on direct access bvec table
> block: enable multipage bvecs
>
> block/bio.c | 110 +++++++++++++++----
> block/blk-merge.c | 227 +++++++++++++++++++++++++++++++--------
> block/blk-zoned.c | 5 +-
> block/bounce.c | 75 +++++++++----
> drivers/block/drbd/drbd_bitmap.c | 1 +
> drivers/block/loop.c | 5 +
> drivers/md/bcache/btree.c | 4 +-
> drivers/md/bcache/debug.c | 30 +++++-
> drivers/md/bcache/super.c | 6 ++
> drivers/md/bcache/util.c | 7 ++
> drivers/md/dm-crypt.c | 4 +-
> drivers/md/dm.c | 11 +-
> drivers/md/md.c | 12 +++
> drivers/md/raid1.c | 3 +-
> fs/block_dev.c | 6 +-
> fs/btrfs/check-integrity.c | 12 ++-
> fs/btrfs/compression.c | 12 ++-
> fs/btrfs/disk-io.c | 3 +-
> fs/btrfs/extent_io.c | 26 +++--
> fs/btrfs/extent_io.h | 1 +
> fs/btrfs/file-item.c | 6 +-
> fs/btrfs/inode.c | 34 ++++--
> fs/btrfs/raid56.c | 6 +-
> fs/buffer.c | 24 +++--
> fs/crypto/crypto.c | 3 +-
> fs/direct-io.c | 4 +-
> fs/exofs/ore.c | 3 +-
> fs/exofs/ore_raid.c | 3 +-
> fs/ext4/page-io.c | 3 +-
> fs/ext4/readpage.c | 3 +-
> fs/f2fs/data.c | 13 ++-
> fs/gfs2/lops.c | 3 +-
> fs/gfs2/meta_io.c | 3 +-
> fs/iomap.c | 3 +-
> fs/mpage.c | 3 +-
> fs/xfs/xfs_aops.c | 3 +-
> include/linux/bio.h | 164 ++++++++++++++++++++++++++--
> include/linux/blk_types.h | 6 ++
> include/linux/blkdev.h | 2 +
> include/linux/bvec.h | 138 ++++++++++++++++++++++--
> kernel/power/swap.c | 2 +
> mm/page_io.c | 2 +
> 42 files changed, 829 insertions(+), 162 deletions(-)
>
> --
> 2.7.4
>



--
Ming Lei