Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: moveblkio_group_conf->weight to cfq)

From: Vivek Goyal
Date: Wed Apr 04 2012 - 09:37:52 EST


On Wed, Apr 04, 2012 at 05:35:49AM -0700, Shaohua Li wrote:

[..]
> >> How iops_weight and switching different than CFQ group scheduling logic?
> >> I think shaohua was talking of using similar logic. What would you do
> >> fundamentally different so that without idling you will get service
> >> differentiation?
> > I am thinking of differentiate different groups with iops, so if there
> > are 3 groups(the weight are 100, 200, 300) we can let them submit 1 io,
> > 2 io and 3 io in a round-robin way. With a intel ssd, every io can be
> > finished within 100us. So the maximum latency for one io is about 600us,
> > still less than 1ms. But with cfq, if all the cgroups are busy, we have
> > to switch between these group in ms which means the maximum latency will
> > be 6ms. It is terrible for some applications since they use ssds now.
> Yes, with iops based scheduling, we do queue switching for every request.
> Doing the same thing between groups is quite straightforward. The only issue
> I found is this will introduce more process context switch, this isn't
> a big issue
> for io bound application, but depends. It cuts latency a lot, which I
> guess is more
> important for web 2.0 application.

In iops_mode(), expire each cfqq after dispatch of 1 or bunch of requests
and you should get the same behavior (with slice_idle=0 and group_idle=0).
So why write a new scheduler.

Only thing is that with above, current code will provide iops fairness only
for groups. We should be able to tweak queue scheduling to support iops
fairness also.

Anyway, we will end up doing that at some point of time. Supporting two
scheduling algorihtms for queue and groups is not sustainable. There are
already calls to make CFQ hierarchical and in that case both queue and
groups need to be on a single service tree and that means need to follow
same algorithm for scheduling.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/