Re: [RFC] [PATCH] cfq-iosched: add cfq group hierarchical scheduling support

From: Nauman Rafique
Date: Wed Sep 01 2010 - 13:16:02 EST


On Wed, Sep 1, 2010 at 10:10 AM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> On Wed, Sep 01, 2010 at 08:49:26AM -0700, Nauman Rafique wrote:
>> On Wed, Sep 1, 2010 at 1:50 AM, Gui Jianfeng <guijianfeng@xxxxxxxxxxxxxx> wrote:
>> > Vivek Goyal wrote:
>> >> On Tue, Aug 31, 2010 at 08:40:19AM -0700, Nauman Rafique wrote:
>> >>> On Tue, Aug 31, 2010 at 5:57 AM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>> >>>> On Tue, Aug 31, 2010 at 08:29:20AM +0800, Gui Jianfeng wrote:
>> >>>>> Vivek Goyal wrote:
>> >>>>>> On Mon, Aug 30, 2010 at 02:50:40PM +0800, Gui Jianfeng wrote:
>> >>>>>>> Hi All,
>> >>>>>>>
>> >>>>>>> This patch enables cfq group hierarchical scheduling.
>> >>>>>>>
>> >>>>>>> With this patch, you can create a cgroup directory deeper than level 1.
>> >>>>>>> Now, I/O Bandwidth is distributed in a hierarchy way. For example:
>> >>>>>>> We create cgroup directories as following(the number represents weight):
>> >>>>>>>
>> >>>>>>> Â Â Â Â Â Â Root grp
>> >>>>>>> Â Â Â Â Â Â/ Â Â Â \
>> >>>>>>> Â Â Â Âgrp_1(100) grp_2(400)
>> >>>>>>> Â Â Â Â/ Â Â\
>> >>>>>>> Â grp_3(200) grp_4(300)
>> >>>>>>>
>> >>>>>>> If grp_2 grp_3 and grp_4 are contending for I/O Bandwidth,
>> >>>>>>> grp_2 will share 80% of total bandwidth.
>> >>>>>>> For sub_groups, grp_3 shares 8%(20% * 40%), grp_4 shares 12%(20% * 60%)
>> >>>>>>>
>> >>>>>>> Design:
>> >>>>>>> Â o Each cfq group has its own group service tree.
>> >>>>>>> Â o Each cfq group contains a "group schedule entity" (gse) that
>> >>>>>>> Â Â schedules on parent cfq group's service tree.
>> >>>>>>> Â o Each cfq group contains a "queue schedule entity"(qse), it
>> >>>>>>> Â Â represents all cfqqs located on this cfq group. It schedules
>> >>>>>>> Â Â on this group's service tree. For the time being, root group
>> >>>>>>> Â Â qse's weight is 1000, and subgroup qse's weight is 500.
>> >>>>>>> Â o All gses and qse which belones to a same cfq group schedules
>> >>>>>>> Â Â on the same group service tree.
>> >>>>>> Hi Gui,
>> >>>>>>
>> >>>>>> Thanks for the patch. I have few questions.
>> >>>>>>
>> >>>>>> - So how does the hierarchy look like, w.r.t root group. Something as
>> >>>>>> Â follows?
>> >>>>>>
>> >>>>>>
>> >>>>>> Â Â Â Â Â Â Â Â Â Â root
>> >>>>>> Â Â Â Â Â Â Â Â Â Â/ | \
>> >>>>>> Â Â Â Â Â Â Â Â Âq1 Âq2 G1
>> >>>>>>
>> >>>>>> Assume there are two processes doin IO in root group and q1 and q2 are
>> >>>>>> cfqq queues for those processes and G1 is the cgroup created by user.
>> >>>>>>
>> >>>>>> If yes, then what algorithm do you use to do scheduling between q1, q2
>> >>>>>> and G1? IOW, currently we have two algorithms operating in CFQ. One for
>> >>>>>> cfqq and other for groups. Group algorithm does not use the logic of
>> >>>>>> cfq_slice_offset().
>> >>>>> Hi Vivek,
>> >>>>>
>> >>>>> This patch doesn't break the original sheduling logic. That is cfqg => st => cfqq.
>> >>>>> If q1 and q2 in root group, I treat q1 and q2 bundle as a queue sched entity, and
>> >>>>> it will schedule on root group service with G1, as following:
>> >>>>>
>> >>>>> Â Â Â Â Â Â Â Â Â Â Â Â Âroot group
>> >>>>> Â Â Â Â Â Â Â Â Â Â Â Â / Â Â Â Â \
>> >>>>> Â Â Â Â Â Â Â Â Â Â qse(q1,q2) Â Âgse(G1)
>> >>>>>
>> >>>> Ok. That's interesting. That raises another question that how hierarchy
>> >>>> should look like. IOW, how queue and groups should be treated in
>> >>>> hierarchy.
>> >>>>
>> >>>> CFS cpu scheduler treats queues and group at the same level. That is as
>> >>>> follows.
>> >>>>
>> >>>> Â Â Â Â Â Â Â Â Â Â Â Âroot
>> >>>> Â Â Â Â Â Â Â Â Â Â Â Â/ | \
>> >>>> Â Â Â Â Â Â Â Â Â Â Â q1 q2 G1
>> >>>>
>> >>>> In the past I had raised this question and Jens and corrado liked treating
>> >>>> queues and group at same level.
>> >>>>
>> >>>> Logically, q1, q2 and G1 are all children of root, so it makes sense to
>> >>>> treat them at same level and not group q1 and q2 in to a single entity and
>> >>>> group.
>> >>>>
>> >>>> One of the possible way forward could be this.
>> >>>>
>> >>>> - Treat queue and group at same level (like CFS)
>> >>>>
>> >>>> - Get rid of cfq_slice_offset() logic. That means without idling on, there
>> >>>> Âwill be no ioprio difference between cfq queues. I think anyway as of
>> >>>> Âtoday that logic helps in so little situations that I would not mind
>> >>>> Âgetting rid of it. Just that Jens should agree to it.
>> >>>>
>> >>>> - With this new scheme, it will break the existing semantics of root group
>> >>>> Âbeing at same level as child groups. To avoid that, we can probably
>> >>>> Âimplement two modes (flat and hierarchical), something similar to what
>> >>>> Âmemory cgroup controller has done. May be one tunable in root cgroup of
>> >>>> Âblkio "use_hierarchy". ÂBy default everything will be in flat mode and
>> >>>> Âif user wants hiearchical control, he needs to set user_hierarchy in
>> >>>> Âroot group.
>> >>> Vivek, may be I am reading you wrong here. But you are first
>> >>> suggesting to add more complexity to treat queues and group at the
>> >>> same level. Then you are suggesting add even more complexity to fix
>> >>> the problems caused by that approach.
>> >>>
>> >>> Why do we need to treat queues and group at the same level? "CFS does
>> >>> it" is not a good argument.
>> >>
>> >> Sure it is not a very good argument but at the same time one would need
>> >> a very good argument that why we should do things differently.
>> >>
>> >> - If a user has mounted cpu and blkio controller together and both the
>> >> Â controllers are viewing the same hierarchy differently, then it is
>> >> Â odd. We need a good reason that why different arrangement makes sense.
>> >
>> > Hi Vivekï
>> >
>> > Even if we mount cpu and blkio together, to me, it's ok for cpu and blkio
>> > having their own logic, since they are totally different cgroup subsystems.
>> >
>> >>
>> >> - To me, both group and cfq queue are children of root group and it
>> >> Â makes sense to treat them independent childrens instead of putting
>> >> Â all the queues in one logical group which inherits the weight of
>> >> Â parent.
>> >>
>> >> - With this new scheme, I am finding it hard to visualize the hierachy.
>> >> Â How do you assign the weights to queue entities of a group. It is more
>> >> Â like a invisible group with-in group. We shall have to create new
>> >> Â tunable which can speicy the weight for this hidden group.
>> >
>> > For the time being, the root "qse" weight is 1000 and others is 500, they don't
>> > inherit the weight of parent. I was thinking that maybe we can determine the qse
>> > weight in term of the queue number and weight in this group and subgroups.
>> >
>> > Thanks,
>> > Gui
>> >
>> >>
>> >>
>> >> So in summary I am liking the "queue at same level as group" scheme for
>> >> two reasons.
>> >>
>> >> - It is more intutive to visualize and implement. It follows the true
>> >> Â hierarchy as seen by cgroup file system.
>> >>
>> >> - CFS has already implemented this scheme. So we need a strong arguemnt
>> >> Â to justify why we should not follow the same thing. Especially for
>> >> Â the case where user has co-mounted cpu and blkio controller.
>> >>
>> >> - It can achieve the same goal as "hidden group" proposal just by
>> >> Â creating a cgroup explicitly and moving all threads in that group.
>> >>
>> >> Why do you think that "hidden group" proposal is better than "treating
>> >> queue at same level as group" ?
>>
>> There are multiple reasons for "hidden group" proposal being a better approach.
>>
>> - "Hidden group" would allow us to keep scheduling queues using the
>> CFQ queue scheduling logic. And does not require any major changes in
>> CFQ. Aren't we already using that approach to deal with queues at the
>> root group?
>
> Currently we are operating in flat mode where all the groups are at
> same level (irrespective their position in cgroup hiearchy).
>
>>
>> - If queues and groups are treated at the same level, queues can end
>> up in root cgroup. And we cannot put an upper bound on the number of
>> those queues. Those queues can consume system resources in proportion
>> to their number, causing the performance of groups to suffer. If we
>> have "hidden group", we can configure it to a small weight, and that
>> would limit the impact these queues in root group can have.
>
> To limit the impact of other queues in cgroup, one can use libcgroup to
> automatically place new threads or tasks into a subgroup.
>
> I understand that kernel doing it by default should help though. It is
> less work in terms of configuration. But I am not sure that's a good
> argument to design kernel functionality. Kernel functionality should be
> pretty generic.
>
> Anyway, how would you assign the weight to the hidden group. What's the
> interface for that? A new cgroup file inside each cgroup? Personally
> I think that's little odd interface. Every group has one hidden group
> where all the queues in that group go and weight of that group can be
> specified by a cgroup file.

I think picking a reasonable default weight at compile time is not
that bad an option, given that threads showing up in the "hidden
group" is an uncommon case.

>
> But anyway, I am not tied to any of the approach. I am just trying to
> make sure that we have put enough thought into it as changing it later
> will be hard.
>
> Vivek
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/