Re: [PATCH] Load balancing problem in 2.6.2-mm1

From: Rick Lindsley
Date: Sat Feb 07 2004 - 19:44:43 EST


Its got to be an overly enthuiastic active balance, the migration threads
have used about 10 minutes of cpu time and a single cpu bound process
will never sleep (assuming there is nothing else to run) and so cannot be
moved by normal means.

The current imbalance code rounds up to 1, meaning that we'll often
see an "imbalance" of 1 even when it's 1 to 0 and just been moved.
Did you see these results even with Martin's patch to not round up to 1?

Easiest way to turn off the active balance (for this test, at least)
is this patch which just turns off that code:

diff -rup linux-2.6.2-mm1/kernel/sched.c linux-2.6.2-mm1+/kernel/sched.c
--- linux-2.6.2-mm1/kernel/sched.c Thu Feb 5 14:47:17 2004
+++ linux-2.6.2-mm1+/kernel/sched.c Sat Feb 7 16:39:18 2004
@@ -1525,6 +1525,7 @@ out:
if (!balanced && nr_moved == 0)
failed = 1;

+#if 0
if (domain->flags & SD_FLAG_IDLE && failed && busiest &&
domain->nr_balance_failed > domain->cache_nice_tries) {
int i;
@@ -1546,6 +1547,7 @@ out:
wake_up_process(busiest->migration_thread);
}
}
+#endif

if (failed)
domain->nr_balance_failed++;

Not the right long-term solution but at least we can pin down where this
obviously incorrect behavior is coming from.

Rick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/