[PATCH] sched: make sure busiest group and run queue are pullable
From: Peter Williams
Date: Fri Mar 24 2006 - 22:41:28 EST
Peter Williams wrote:
Peter Williams wrote:
Siddha, Suresh B wrote:
more issues with smpnice patch...
a) consider a 4-way system (simple SMP system with no HT and cores)
scenario
where a high priority task (nice -20) is running on P0 and two normal
priority tasks running on P1. load balance with smp nice code
will never be able to detect an imbalance and hence will never move
one of the normal priority tasks on P1 to idle cpus P2 or P3.
Why?
OK, I think I know why. The load balancing code will always decide that
P0 is the busiest CPU, right?
Attached is a patch that addresses this problem. The strategies
employed are:
1. for find_busiest_group() only consider groups that have at least one
CPU with more than one task running as candidates for "busiest", and
2. for find_busiest_queue() only consider queues that have more than one
running tasks as candidates for "busiest".
I think that the overhead gains from earlier abandonment of load
balancing attempts that would eventually (most probably -- see next
paragraph) be abandoned anyway will compensate for the extra overhead
introduced in these functions.
I think that the only likely behavioural changes for an all tasks have
nice==0 system is that without these checks there is a small chance that
a "busiest" that only has one runnable task (and for which move_tasks()
would eventually not move any tasks) when these tests are made may
actually acquire extra runnable tasks before the locks are taken in
preparation for calling move_tasks() and, therefore, load balancing may
actually take place. I think that this effect can be safely ignored.
Signed-off-by: Peter Williams <pwil3058@xxxxxxxxxxxxxx>
Peter
--
Peter Williams pwil3058@xxxxxxxxxxxxxx
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
Index: MM-2.6.X/kernel/sched.c
===================================================================
--- MM-2.6.X.orig/kernel/sched.c 2006-03-25 13:43:06.000000000 +1100
+++ MM-2.6.X/kernel/sched.c 2006-03-25 13:56:37.000000000 +1100
@@ -2115,6 +2115,7 @@ find_busiest_group(struct sched_domain *
int local_group;
int i;
unsigned long sum_nr_running, sum_weighted_load;
+ unsigned int nr_loaded_cpus = 0; /* where nr_running > 1 */
local_group = cpu_isset(this_cpu, group->cpumask);
@@ -2135,6 +2136,8 @@ find_busiest_group(struct sched_domain *
avg_load += load;
sum_nr_running += rq->nr_running;
+ if (rq->nr_running > 1)
+ ++nr_loaded_cpus;
sum_weighted_load += rq->raw_weighted_load;
}
@@ -2149,7 +2152,7 @@ find_busiest_group(struct sched_domain *
this = group;
this_nr_running = sum_nr_running;
this_load_per_task = sum_weighted_load;
- } else if (avg_load > max_load) {
+ } else if (nr_loaded_cpus && avg_load > max_load) {
max_load = avg_load;
busiest = group;
busiest_nr_running = sum_nr_running;
@@ -2258,16 +2261,16 @@ out_balanced:
static runqueue_t *find_busiest_queue(struct sched_group *group,
enum idle_type idle)
{
- unsigned long load, max_load = 0;
- runqueue_t *busiest = NULL;
+ unsigned long max_load = 0;
+ runqueue_t *busiest = NULL, *rqi;
int i;
for_each_cpu_mask(i, group->cpumask) {
- load = weighted_cpuload(i);
+ rqi = cpu_rq(i);
- if (load > max_load) {
- max_load = load;
- busiest = cpu_rq(i);
+ if (rqi->nr_running > 1 && rqi->raw_weighted_load > max_load) {
+ max_load = rqi->raw_weighted_load;
+ busiest = rqi;
}
}