Re: [PATCH 0/8 v2] Introduce CFQ group hierarchical scheduling and"use_hierarchy" interface

From: Vivek Goyal
Date: Mon Dec 13 2010 - 22:29:50 EST


On Tue, Dec 14, 2010 at 11:06:26AM +0800, Gui Jianfeng wrote:
> Vivek Goyal wrote:
> > On Mon, Dec 13, 2010 at 09:44:10AM +0800, Gui Jianfeng wrote:
> >> Hi
> >>
> >> Previously, I posted a patchset to add support of CFQ group hierarchical scheduling
> >> in the way that it puts all CFQ queues in a hidden group and schedules with other
> >> CFQ group under their parent. The patchset is available here,
> >> http://lkml.org/lkml/2010/8/30/30
> >>
> >> Vivek think this approach isn't so instinct that we should treat CFQ queues
> >> and groups at the same level. Here is the new approach for hierarchical
> >> scheduling based on Vivek's suggestion. The most big change of CFQ is that
> >> it gets rid of cfq_slice_offset logic, and makes use of vdisktime for CFQ
> >> queue scheduling just like CFQ group does. But I still give cfqq some jump
> >> in vdisktime based on ioprio, thanks for Vivek to point out this. Now CFQ
> >> queue and CFQ group uses the same scheduling algorithm.
> >
> > Hi Gui,
> >
> > Thanks for the patches. Few thoughts.
> >
> > - I think we can implement vdisktime jump logic for both cfq queue and
> > cfq groups. So any entity (queue/group) which is being backlogged fresh
> > will get the vdisktime jump but anything which has been using its slice
> > will get queued at the end of tree.
>
> Vivek,
>
> vdisktime jump for both CFQ queue and CFQ group is ok to me.
> what do you mean "anything which has been using its slice will get queued at the
> end of tree."
> Currently, if a CFQ entity uses up its time slice, we'll update its vdisktime,
> why should we put it at the end of tree.

Sorry, what I actually meant was that any queue/group which has been using
its slice and is being requeued will be queue at a position based on vdisktime
calculation and no boost logic required. For queues/groups which gets queued
new gets a vdisktime boost. That way once we disable slice_idle=0 and
group_idle=0, we might get good bandwidth utilization at the same time
some service differentation for higher weight queues/groups.

>
>
> >
> > - Have you done testing in true hierarchical mode. In the sense that
> > create atleast two level of hierarchy and see if bandwidth division
> > is happening properly. Something like as follows.
> >
> > root
> > / \
> > test1 test2
> > / \ / \
> > G1 G2 G3 G4
>
> yes, I tested with two level, and works fine.
>
> >
> > - On what kind of storage you have been doing your testing? I have noticed
> > that IO controllers works well only with idling on and with idling on
> > performance is bad on high end storage. The simple reason being that
> > an storage array can support multiple IOs at the same time and if we
> > are idling on queue or group in an attempt to provide fairness it hurts.
> > It hurts especially more if we are doing random IO (I am assuming this
> > is more typical of workloads).
> >
> > So we need to come up with a proper logic so that we can provide some
> > kind of fairness even with idle disabled. I think that's where this
> > vdisktime jump logic comes into picture and is important to get it
> > right.
> >
> > So can you also do some testing with idle disabled (both queue
> > and group) and see if the vdisktime logic is helping with providing
> > some kind of service differentation. I think results will vary
> > based on what is the storage and what queue depth are you driving. You
> > can even try to do this testing on an SSD.
>
> I tested on sata. will do more tests when idle disabled.

Ok, actulally SATA with low queue depth is the case where block IO controller
works best. I am also keen to make it work well for SSDs and faster storage
like storage arrays without losing too much of throughput in the process.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/