On Thu, Jun 23, 2011 at 08:21:59PM +0400, Konstantin Khlebnikov wrote:commit v2.6.32-102-g8682e1f "blkio: Provide some isolation between groups" break
fast switching between task and journal-thread for very common write-fsync workload.
cfq wait idle slice at each cfqq switch, if this task is from non-root blkio cgroup.
This patch move idling sync-noidle preempting check little bit upwards and update
new service_tree->count check for case with two different groups.
I do not quite understand what means these check for new_cfqq, but now it even works.
Without patch I got 49 iops and with this patch 798, for this trivial fio script:
[write-fsync]
cgroup=test
cgroup_weight=1000
rw=write
fsync=1
size=100m
runtime=10s
What kind of storage and filesystem you are using? I tried this on a SATA
disk and I really don't get good throughput. With deadline scheduler I
get aggrb=103KB/s.
I think with fsync we are generating so many FLUSH requests that it
really slows down fsync.
Even if I use CFQ with and without cgroups, I get following.
CFQ, without cgroup
------------------
aggrb=100KB/s
CFQ with cgroup
--------------
aggrb=94KB/s
So with FLUSH requests, not much difference in throughput for this
workload.
I guess you must be running with barriers off or something like that.
Thanks
Vivek
Signed-off-by: Konstantin Khlebnikov<khlebnikov@xxxxxxxxxx>
---
block/cfq-iosched.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 3c7b537..c71533e 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3318,19 +3318,19 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq,
if (rq_is_sync(rq)&& !cfq_cfqq_sync(cfqq))
return true;
- if (new_cfqq->cfqg != cfqq->cfqg)
- return false;
-
- if (cfq_slice_used(cfqq))
- return true;
-
/* Allow preemption only if we are idling on sync-noidle tree */
if (cfqd->serving_type == SYNC_NOIDLE_WORKLOAD&&
cfqq_type(new_cfqq) == SYNC_NOIDLE_WORKLOAD&&
- new_cfqq->service_tree->count == 2&&
+ new_cfqq->service_tree->count == 1+(new_cfqq->cfqg == cfqq->cfqg)&&
RB_EMPTY_ROOT(&cfqq->sort_list))
return true;
+ if (new_cfqq->cfqg != cfqq->cfqg)
+ return false;
+
+ if (cfq_slice_used(cfqq))
+ return true;
+
/*
* So both queues are sync. Let the new request get disk time if
* it's a metadata request and the current queue is doing regular IO.