Re: allow the load to grow upto its cpu_power (was Re: [Patch]don't kick ALB in the presence of pinned task)

From: Nick Piggin
Date: Wed Aug 10 2005 - 22:11:15 EST


On Tue, 2005-08-09 at 19:03 -0700, Siddha, Suresh B wrote:
> On Wed, Aug 10, 2005 at 10:27:44AM +1000, Nick Piggin wrote:
> > Yeah this makes sense. Thanks.
> >
> > I think we'll only need your first line change to fix this, though.
> >
> > Your second change will break situations where a single group is very
> > loaded, but it is in a domain with lots of cpu_power
> > (total_load <= total_power).
>
> In that case, we will move the excess load from that group to some
> other group which is below its capacity. Instead of bringing everyone
> to avg load, we make sure that everyone is at or below its cpu_power.
> This will minimize the movements between the nodes.
>
> For example, Let us assume sched groups node-0, node-1 each has
> 4*SCHED_LOAD_SCALE as its cpu_power.
>
> And with 6 tasks on node-0 and 0 on node-1, current load balance
> will move 3 tasks from node-0 to 1. But with my patch, it will move only
> 2 tasks to node-1. Is this what you are referring to as breakage?
>

No, I had thought it was possible to get into a situation where
one queue could be very loaded but not have anyone to pull from
it if total_load <= total_pwr.

I see that shouldn't happen though.

I have a variation on the 2nd part of your patch which I think
I would prefer. IMO it kind of generalises the current imbalance
calculation to handle this case rather than introducing a new
special case.

Untested as yet, but I'll queue it to send to Andrew after it
gets some testing - unless you have any objections that is.

Thanks,
Nick

--
SUSE Labs, Novell Inc.


Don't pull tasks from a group if that would cause the
group's total load to drop below its total cpu_power
(ie. cause the group to start going idle).

Signed-off-by: Suresh Siddha <suresh.b.siddha@xxxxxxxxx>
Signed-off-by: Nick Piggin <npiggin@xxxxxxx>

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c 2005-08-11 12:10:10.199651212 +1000
+++ linux-2.6/kernel/sched.c 2005-08-11 12:53:15.361971195 +1000
@@ -1886,6 +1886,7 @@
{
struct sched_group *busiest = NULL, *this = NULL, *group = sd->groups;
unsigned long max_load, avg_load, total_load, this_load, total_pwr;
+ unsigned long max_pull;
int load_idx;

max_load = this_load = total_load = total_pwr = 0;
@@ -1932,7 +1933,7 @@
group = group->next;
} while (group != sd->groups);

- if (!busiest || this_load >= max_load)
+ if (!busiest || this_load >= max_load || max_load <= SCHED_LOAD_SCALE)
goto out_balanced;

avg_load = (SCHED_LOAD_SCALE * total_load) / total_pwr;
@@ -1952,8 +1953,12 @@
* by pulling tasks to us. Be careful of negative numbers as they'll
* appear as very large values with unsigned longs.
*/
+
+ /* Don't want to pull so many tasks that a group would go idle */
+ max_pull = min(max_load - avg_load, max_load - SCHED_LOAD_SCALE);
+
/* How much load to actually move to equalise the imbalance */
- *imbalance = min((max_load - avg_load) * busiest->cpu_power,
+ *imbalance = min(max_pull * busiest->cpu_power,
(avg_load - this_load) * this->cpu_power)
/ SCHED_LOAD_SCALE;