On Tue, Jun 16, 2015 at 6:28 AM, juncheng baiHi.
<baijuncheng@xxxxxxxxxxxxxxx> wrote:
On 2015/6/15 22:27, Ilya Dryomov wrote:
Hi. I update kernel version to 4.0.5. The test information as shown below:
On Mon, Jun 15, 2015 at 4:23 PM, juncheng bai
<baijuncheng@xxxxxxxxxxxxxxx> wrote:
On 2015/6/15 21:03, Ilya Dryomov wrote:
On Mon, Jun 15, 2015 at 2:18 PM, juncheng bai
<baijuncheng@xxxxxxxxxxxxxxx> wrote:
From 6213215bd19926d1063d4e01a248107dab8a899b Mon Sep 17 00:00:00
2001
From: juncheng bai <baijuncheng@xxxxxxxxxxxxxxx>
Date: Mon, 15 Jun 2015 18:34:00 +0800
Subject: [PATCH] storage:rbd: make the size of request is equal to the
size of the object
ensures that the merged size of request can achieve the size of
the object.
when merge a bio to request or merge a request to request, the
sum of the segment number of the current request and the segment
number of the bio is not greater than the max segments of the request,
so the max size of request is 512k if the max segments of request is
BLK_MAX_SEGMENTS.
Signed-off-by: juncheng bai <baijuncheng@xxxxxxxxxxxxxxx>
---
drivers/block/rbd.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 0a54c58..dec6045 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3757,6 +3757,8 @@ static int rbd_init_disk(struct rbd_device
*rbd_dev)
segment_size = rbd_obj_bytes(&rbd_dev->header);
blk_queue_max_hw_sectors(q, segment_size / SECTOR_SIZE);
blk_queue_max_segment_size(q, segment_size);
+ if (segment_size > BLK_MAX_SEGMENTS * PAGE_SIZE)
+ blk_queue_max_segments(q, segment_size / PAGE_SIZE);
blk_queue_io_min(q, segment_size);
blk_queue_io_opt(q, segment_size);
I made a similar patch on Friday, investigating blk-mq plugging issue
reported by Nick. My patch sets it to BIO_MAX_PAGES unconditionally -
AFAIU there is no point in setting to anything bigger since the bios
will be clipped to that number of vecs. Given that BIO_MAX_PAGES is
256, this gives is 1M direct I/Os.
Hi. For signal bio, the max number of bio_vec is BIO_MAX_PAGES, but a
request can be merged from multiple bios. We can see the below function:
ll_back_merge_fn, ll_front_merge_fn and etc.
And I test in kernel 3.18 use this patch, and do:
echo 4096 > /sys/block/rbd0/queue/max_sectors_kb
We use systemtap to trace the request size, It is upto 4M.
Kernel 3.18 is pre rbd blk-mq transition, which happened in 4.0. You
should test whatever patches you have with at least 4.0.
Putting that aside, I must be missing something. You'll get 4M
requests on 3.18 both with your patch and without it, the only
difference would be the size of bios being merged - 512k vs 1M. Can
you describe your test workload and provide before and after traces?
The base information:
03:28:13-root@server-186:~$uname -r
4.0.5
My simple systemtap script:
probe module("rbd").function("rbd_img_request_create")
{
printf("offset:%lu length:%lu\n", ulong_arg(2), ulong_arg(3));
}
I use dd to execute the test case:
dd if=/dev/zero of=/dev/rbd0 bs=4M count=1 oflag=direct
Case one: Without patch
03:30:23-root@server-186:~$cat /sys/block/rbd0/queue/max_sectors_kb
4096
03:30:35-root@server-186:~$cat /sys/block/rbd0/queue/max_segments
128
The output of systemtap for nornal data:
offset:0 length:524288
offset:524288 length:524288
offset:1048576 length:524288
offset:1572864 length:524288
offset:2097152 length:524288
offset:2621440 length:524288
offset:3145728 length:524288
offset:3670016 length:524288
Case two:With patch
cat /sys/block/rbd0/queue/max_sectors_kb
4096
03:49:14-root@server-186:linux-4.0.5$cat /sys/block/rbd0/queue/max_segments
1024
The output of systemtap for nornal data:
offset:0 length:1048576
offset:1048576 length:1048576
offset:2097152 length:1048576
offset:3145728 length:1048576
According to the test, you are right.
Because the blk-mq doesn't use any scheduling policy.
03:52:13-root@server-186:linux-4.0.5$cat /sys/block/rbd0/queue/scheduler
none
In previous versions of the kernel 4.0, the rbd use the defualt
scheduler:cfq
So, I think that the blk-mq need to do more?
There is no scheduler support in blk-mq as of now but your numbers
don't have anything to do with that. The current behaviour is a result
of a bug in blk-mq. It's fixed by [1], if you apply it you should see
4M requests with your stap script.
[1] http://article.gmane.org/gmane.linux.kernel/1941750
Thanks,--
Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html