On 09/17/2015 09:50 AM, Ming Lei wrote:
On Thu, Sep 17, 2015 at 11:19 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:
On 09/17/2015 09:13 AM, Ming Lei wrote:
biovecs has become immutable since v3.13, so it isn't necessary
to allocate biovecs for the new cloned bios, then we can save
one extra biovecs allocation/copy, and the allocation is often
not fixed-length and a bit more expensive.
For example, if the 'max_sectors_kb' of null blk's queue is set
as 16(32 sectors) via sysfs just for making more splits, this patch
can increase throught about ~70% in the sequential read test over
null_blk(direct io, bs: 1M).
I'd be curious how this compares to before we did the splitting, not
exceeding the limits through bio_add_page() instead?
Let me show these test results:
----------------------------------------------------------------------------------
kernel | throught
----------------------------------------------------------------------------------
4.3.0-rc1-next-20150916 | bw=12227MB/s, iops=12227
----------------------------------------------------------------------------------
4.3.0-rc1-next-20150916 with patch | bw=21011MB/s, iops=21011
----------------------------------------------------------------------------------
v4.2 |
bw=18959MB/s, iops=18958
----------------------------------------------------------------------------------
So from the above, looks this patch is kind of fix for performance
regression
introduced by 54efd50bfd(block: make generic_make_request handle
arbitrarily sized bios), :-)
So that's 1MB user IO, and 16KB device limit, correct? If that is the
case, then the results make sense. And looks like we're still ahead of
the older bio_add_page() approach, which is what I mostly cared about.
Thanks! I'll apply this for -rc2.