Re: [PATCH] cfq-iosched: queue groups more gracefully

From: Konstantin Khlebnikov
Date: Fri Jun 24 2011 - 07:13:28 EST


Vivek Goyal wrote:
On Thu, Jun 23, 2011 at 08:22:06PM +0400, Konstantin Khlebnikov wrote:
This patch queue awakened cfq-groups according its current vdisktime,
it try to save upto one group timeslice from unused virtual disk time.
Thus group does not loses everything, if it was not continuously backlogged.

Signed-off-by: Konstantin Khlebnikov<khlebnikov@xxxxxxxxxx>

I think this patch is not required till we start preemption across
groups? Any more details of actual use will help.

I saw some problems with fairness and latency between groups with parallel
intensive IO and interactive groups -- cfq always put interactive groups at the end,
so its latency is extremely high. With this patch interactive groups got real chance to
be scheduled much earlier. I'm sorry, I can not show simple test-cases right now.


---
block/cfq-iosched.c | 36 ++++++++++++++++++++++++++++++------
1 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index c71533e..d5c7c79 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -592,6 +592,26 @@ cfq_group_slice(struct cfq_data *cfqd, struct cfq_group *cfqg)
return cfq_target_latency * cfqg->weight / st->total_weight;
}

+static inline u64
+cfq_group_vslice(struct cfq_data *cfqd, struct cfq_group *cfqg)
+{
+ struct cfq_rb_root *st =&cfqd->grp_service_tree;
+ u64 vslice;
+
+ /* There no group slices in iops mode */
+ if (iops_mode(cfqd))
+ return 0;
+
+ /*
+ * Equal to cfq_scale_slice(cfq_group_slice(cfqd, cfqg), cfqg).
+ * Add group weight beacuse it currently not in service tree.
+ */
+ vslice = (u64)cfq_target_latency<< CFQ_SERVICE_SHIFT;
+ vslice *= BLKIO_WEIGHT_DEFAULT;
+ do_div(vslice, st->total_weight + cfqg->weight);

Above is not equivalent to cfq_scale_slice(cfq_group_slice(cfqd, cfqg),
cfqg) as comment says.

you are not calculating cfq_group_slice(). Instead using cfq_target_latency.

No, this this expression gives the same value as cfq_scale_slice(cfq_group_slice())
after the group will be added to service tree. It is equal to slice that the group will receive
if it will be queued immediately after the addition.


Also it does not make sense. A higher weight group gets lower vslice
and in turn gets put further away on the tree. This is reverse of what
you want.

+ return vslice;
+}
+
static inline unsigned
cfq_scaled_cfqq_slice(struct cfq_data *cfqd, struct cfq_queue *cfqq)
{
@@ -884,16 +904,20 @@ cfq_group_notify_queue_add(struct cfq_data *cfqd, struct cfq_group *cfqg)
return;

/*
- * Currently put the group at the end. Later implement something
- * so that groups get lesser vtime based on their weights, so that
- * if group does not loose all if it was not continuously backlogged.
+ * Bump vdisktime to be greater or equal min_vdisktime.
+ */
+ cfqg->vdisktime = max_vdisktime(cfqg->vdisktime, st->min_vdisktime);
+

why do we need to do this?

Time should not go back, it's dangerous.


+ /*
+ * Put the group at the end, but save one slice from unused time.
*/
n = rb_last(&st->rb);
if (n) {
__cfqg = rb_entry_cfqg(n);
- cfqg->vdisktime = __cfqg->vdisktime + CFQ_IDLE_DELAY;
- } else
- cfqg->vdisktime = st->min_vdisktime;
+ cfqg->vdisktime = max_vdisktime(cfqg->vdisktime,
^^^^^^^
I think you meant st->min_vdisktime here?

No, I adjust group vdisktime to put it at the end, but save up to one slice.
Although there may be a problem with the overlap, with wakeup after looong sleep..

+ __cfqg->vdisktime -
+ cfq_group_vslice(cfqd, cfqg));
+ }
cfq_group_service_tree_add(st, cfqg);
}


Thanks
Vivek

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/