[PATCH] sched: smpnice prevent integer arithmetic wrap problems

From: Peter Williams
Date: Sun Mar 26 2006 - 18:41:00 EST

Next message: hui: "Re: [patch 00/10] PI-futex: -V1"
Previous message: Esben Nielsen: "PI patch against 2.6.16-rt9"
In reply to: Peter Williams: "Re: more smpnice patch issues"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Peter Williams wrote:

Siddha, Suresh B wrote:
more issues with smpnice patch...

a) consider a 4-way system (simple SMP system with no HT and cores) scenario
where a high priority task (nice -20) is running on P0 and two normal
priority tasks running on P1. load balance with smp nice code
will never be able to detect an imbalance and hence will never move one of the normal priority tasks on P1 to idle cpus P2 or P3.

Fix already sent.

b) smpnice seems to break this patch..

[PATCH] sched: allow the load to grow upto its cpu_power
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0c117f1b4d14380baeed9c883f765ee023da8761

example scenario for this case: consider a numa system with two nodes, each
node containing four processors. if there are two processes in node-0 and with
node-1 being completely idle, your patch will move one of those processes to
node-1 whereas the previous behavior will retain those two processes in node-0..
(in this case, in your code max_load will be less than busiest_load_per_task)

I think that the patch I sent to address a) above will also fix this problem as find_busiest_queue() will no longer find node-0 as the busiest group unless both of the processes in node-0 are on the same CPU. This is because it now only considers groups that have at least one CPU with more than one running task as candidates for being the busiest group.

Implicit in this is the assumption that it's OK to move one of the tasks from node-0 to node-1 if they're both on the same CPU within node-0.

Could you confirm this is OK?

It looks like my coffee was slow kicking in this morning :-)

When I looked at the code more carefully I realized that you're suggestion re comparing avg_load and busiest_load_per_task is needed to protect the calculation of max_pull from integer arithmetic wrapping problems. There was a big clue to this need in the comment above the calculation of max_pull that I failed to read :-(

Anyway the attached patch should fix the problem. It should be applied on top of the other patch.

Signed-off-by: Peter Williams <pwil3058@xxxxxxxxxxxxxx>

Peter
--
Peter Williams pwil3058@xxxxxxxxxxxxxx

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
Index: MM-2.6.X/kernel/sched.c
===================================================================
--- MM-2.6.X.orig/kernel/sched.c 2006-03-25 13:56:37.000000000 +1100
+++ MM-2.6.X/kernel/sched.c 2006-03-27 10:15:38.000000000 +1100
@@ -2161,7 +2161,7 @@ find_busiest_group(struct sched_domain *
group = group->next;
} while (group != sd->groups);

- if (!busiest || this_load >= max_load || busiest_nr_running <= 1)
+ if (!busiest || this_load >= max_load)
goto out_balanced;

avg_load = (SCHED_LOAD_SCALE * total_load) / total_pwr;
@@ -2171,6 +2171,9 @@ find_busiest_group(struct sched_domain *
goto out_balanced;

busiest_load_per_task /= busiest_nr_running;
+
+ if (avg_load <= busiest_load_per_task)
+ goto out_balanced;
/*
* We're trying to get all the cpus to the average_load, so we don't
* want to push ourselves above the average load, nor do we wish to

Next message: hui: "Re: [patch 00/10] PI-futex: -V1"
Previous message: Esben Nielsen: "PI patch against 2.6.16-rt9"
In reply to: Peter Williams: "Re: more smpnice patch issues"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]