Re: [RFT][PATCH] sched, cgroup: Optimize load_balance_fair()

From: Paul Turner
Date: Wed Jul 13 2011 - 17:15:16 EST


On Wed, Jul 13, 2011 at 2:02 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Wed, 2011-07-13 at 10:13 -0700, Paul Turner wrote:
>> Nice! The continued usage of task_groups had been irking me for a
>> while but I haven't had the time to scratch the itch :).
>>
>> On Wed, Jul 13, 2011 at 4:36 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> > Subject: sched, cgroup: Optimize load_balance_fair()
>> > From: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
>> > Date: Wed Jul 13 13:09:25 CEST 2011
>> >
>> > Use for_each_leaf_cfs_rq() instead of list_for_each_entry_rcu(), this
>> > achieves that load_balance_fair() only iterates those task_groups that
>> > actually have tasks on busiest, and that we iterate bottom-up, trying to
>> > move light groups before the heavier ones.
>> >
>> > No idea if it will actually work out to be beneficial in practice, does
>> > anybody have a cgroup workload that might show a difference one way or
>> > the other?
>> >
>> > [ Also move update_h_load to sched_fair.c, loosing #ifdef-ery ]
>> >
>> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
>
>> Reviewed-by: Paul Turner <pjt@xxxxxxxxxx>
>
> So you think I should just merge it and see if any cgroup workload
> dislikes it?
>
> OK, I guess I can do that..
>

The task_list order was completely arbitrary anyway and had some
unfair bias against the groups most recently created.

I experimented a while back with a change that built a (task) h_load
ordered heap and chose which group to balance based on that. It
improved sum(run_delay) on a few workloads and didn't seem to regress
anything. (Unfortunately this was quite a while back and I don't have
the data available anymore.)

Using for_each_leaf_cfs_rq approximates that which leaves my
expectations somewhere between "can't hurt -- it's already arbitrary"
and "might help". Moreover, if there are cases that result in bad
behavior under this new ordering then their group creation order can
likely permuted to create the same problem under the former
task_groups ordering making it a bug we have to deal with either way.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/