Re: [PATCH] Load balancing problem in 2.6.2-mm1
From: Nick Piggin
Date: Sat Feb 07 2004 - 23:06:45 EST
Anton Blanchard wrote:
Hi,
Yeah its because you have a lot of cpus, so the average is still
small. You also need something like
if (*imbalance == 0 && max_load - this_load > SCHED_LOAD_SCALE)
*imbalance = 1;
OK I'll give that a try.
Can you try this patch instead pretty please ;)
Thanks
linux-2.6-npiggin/kernel/sched.c | 52 ++++++++++++++++++++++++---------------
1 files changed, 32 insertions(+), 20 deletions(-)
diff -puN kernel/sched.c~rollup kernel/sched.c
--- linux-2.6/kernel/sched.c~rollup 2004-02-08 15:03:53.000000000 +1100
+++ linux-2.6-npiggin/kernel/sched.c 2004-02-08 15:03:53.000000000 +1100
@@ -1405,16 +1405,28 @@ find_busiest_group(struct sched_domain *
total_load += avg_load;
total_nr_cpus += nr_cpus;
- avg_load /= nr_cpus;
+
+ /*
+ * Load is cumulative over SD_FLAG_IDLE domains, but
+ * spread over !SD_FLAG_IDLE domains. For example, 2
+ * processes running on an SMT CPU puts a load of 2 on
+ * that CPU, however 2 processes running on 2 CPUs puts
+ * a load of 1 on that domain.
+ *
+ * This should be configurable so as SMT siblings become
+ * more powerful, they can "spread" more load - for example,
+ * the above case might only count as a load of 1.7.
+ */
+ if (!(domain->flags & SD_FLAG_IDLE))
+ avg_load /= nr_cpus;
+
+ if (avg_load > max_load)
+ max_load = avg_load;
if (local_group) {
this_load = avg_load;
- goto nextgroup;
- }
-
- if (avg_load >= max_load) {
+ } else if (avg_load >= max_load) {
busiest = group;
- max_load = avg_load;
busiest_nr_cpus = nr_cpus;
}
nextgroup:
@@ -1424,8 +1436,10 @@ nextgroup:
if (!busiest)
goto out_balanced;
- avg_load = total_load / total_nr_cpus;
- if (idle == NOT_IDLE && this_load >= avg_load)
+ if (!(domain->flags & SD_FLAG_IDLE))
+ avg_load = total_load / total_nr_cpus;
+
+ if (this_load >= avg_load)
goto out_balanced;
if (idle == NOT_IDLE && 100*max_load <= domain->imbalance_pct*this_load)
@@ -1437,20 +1451,18 @@ nextgroup:
* reduce the max loaded cpu below the average load, as either of these
* actions would just result in more rebalancing later, and ping-pong
* tasks around. Thus we look for the minimum possible imbalance.
+ * Negative imbalances (*we* are more loaded than anyone else) will
+ * be counted as no imbalance for these purposes -- we can't fix that
+ * by pulling tasks to us. Be careful of negative numbers as they'll
+ * appear as very large values with unsigned longs.
*/
- *imbalance = min(max_load - avg_load, avg_load - this_load);
-
- /* Get rid of the scaling factor now, rounding *up* as we divide */
- *imbalance = (*imbalance + SCHED_LOAD_SCALE - 1) >> SCHED_LOAD_SHIFT;
-
- if (*imbalance == 0) {
- if (package_idle != NOT_IDLE && domain->flags & SD_FLAG_IDLE
- && max_load * busiest_nr_cpus > (3*SCHED_LOAD_SCALE/2))
- *imbalance = 1;
- else
- busiest = NULL;
- }
+ *imbalance = min(max_load - avg_load, avg_load - this_load) / 2;
+ /* Get rid of the scaling factor, rounding *up* as we divide */
+ *imbalance = (*imbalance + SCHED_LOAD_SCALE-1)
+ >> SCHED_LOAD_SHIFT;
+ if (*imbalance == 0 && (max_load - this_load) > SCHED_LOAD_SCALE)
+ *imbalance = 1;
return busiest;
out_balanced:
_