I tried a small experiment today - did a simple restriction of
the O(1) scheduler to only balance inside a node. Coupled with
the small initial load balancing patch floating around, this
covers 95% of cases, is a trivial change (3 lines), performs
just as well as Erich's patch on a kernel compile, and actually
better on schedbench.
This is NOT meant to be a replacement for the code Erich wrote,
it's meant to be a simple way to get integration and acceptance.
Code that just forks and never execs will stay on one node - but
we can take the code Erich wrote, and put it in seperate rebalancer
that fires much less often to do a cross-node rebalance. All that
would be under #ifdef CONFIG_NUMA, the only thing that would touch
mainline is these three lines of change, and it's trivial to see
they're completely equivalent to the current code on non-NUMA systems.
I also believe that this is the more correct approach in design, it
should result in much less cross-node migration of tasks, and less
scanning of remote runqueues.
Opinions / comments?
M.
Kernbench:
Elapsed User System CPU
2.5.54-mjb3 19.41s 186.38s 39.624s 1191.4%
2.5.54-mjb3-mjbsched 19.508s 186.356s 39.888s 1164.6%
Schedbench 4:
AvgUser Elapsed TotalUser TotalSys
2.5.54-mjb3 0.00 35.14 88.82 0.64
2.5.54-mjb3-mjbsched 0.00 31.84 88.91 0.49
Schedbench 8:
AvgUser Elapsed TotalUser TotalSys
2.5.54-mjb3 0.00 47.55 269.36 1.48
2.5.54-mjb3-mjbsched 0.00 41.01 252.34 1.07
Schedbench 16:
AvgUser Elapsed TotalUser TotalSys
2.5.54-mjb3 0.00 76.53 957.48 4.17
2.5.54-mjb3-mjbsched 0.00 69.01 792.71 2.74
Schedbench 32:
AvgUser Elapsed TotalUser TotalSys
2.5.54-mjb3 0.00 145.20 1993.97 11.05
2.5.54-mjb3-mjbsched 0.00 117.47 1798.93 5.95
Schedbench 64:
AvgUser Elapsed TotalUser TotalSys
2.5.54-mjb3 0.00 307.80 4643.55 20.36
2.5.54-mjb3-mjbsched 0.00 241.04 3589.55 12.74
-----------------------------------------
diff -purN -X /home/mbligh/.diff.exclude virgin/kernel/sched.c mjbsched/kernel/sched.c
--- virgin/kernel/sched.c Mon Dec 9 18:46:15 2002
+++ mjbsched/kernel/sched.c Thu Jan 9 14:09:17 2003
@@ -654,7 +654,7 @@ static inline unsigned int double_lock_b
/*
* find_busiest_queue - find the busiest runqueue.
*/
-static inline runqueue_t *find_busiest_queue(runqueue_t *this_rq, int this_cpu, int idle, int *imbalance)
+static inline runqueue_t *find_busiest_queue(runqueue_t *this_rq, int this_cpu, int idle, int *imbalance, unsigned long cpumask)
{
int nr_running, load, max_load, i;
runqueue_t *busiest, *rq_src;
@@ -689,7 +689,7 @@ static inline runqueue_t *find_busiest_q
busiest = NULL;
max_load = 1;
for (i = 0; i < NR_CPUS; i++) {
- if (!cpu_online(i))
+ if (!cpu_online(i) || !((1 << i) & cpumask) )
continue;
rq_src = cpu_rq(i);
@@ -764,7 +764,8 @@ static void load_balance(runqueue_t *thi
struct list_head *head, *curr;
task_t *tmp;
- busiest = find_busiest_queue(this_rq, this_cpu, idle, &imbalance);
+ busiest = find_busiest_queue(this_rq, this_cpu, idle, &imbalance,
+ __node_to_cpu_mask(__cpu_to_node(this_cpu)) );
if (!busiest)
goto out;
---------------------------------------------------
A tiny change in the current ilb patch is also needed to stop it
using a macro from the first patch:
diff -purN -X /home/mbligh/.diff.exclude ilbold/kernel/sched.c ilbnew/kernel/sched.c
--- ilbold/kernel/sched.c Thu Jan 9 15:20:53 2003
+++ ilbnew/kernel/sched.c Thu Jan 9 15:27:49 2003
@@ -2213,6 +2213,7 @@ static void sched_migrate_task(task_t *p
static int sched_best_cpu(struct task_struct *p)
{
int i, minload, load, best_cpu, node = 0;
+ unsigned long cpumask;
best_cpu = task_cpu(p);
if (cpu_rq(best_cpu)->nr_running <= 2)
@@ -2226,9 +2227,11 @@ static int sched_best_cpu(struct task_st
node = i;
}
}
+
minload = 10000000;
- loop_over_node(i,node) {
- if (!cpu_online(i))
+ cpumask = __node_to_cpu_mask(node);
+ for (i = 0; i < NR_CPUS; ++i) {
+ if (!(cpumask & (1 << i)))
continue;
if (cpu_rq(i)->nr_running < minload) {
best_cpu = i;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Wed Jan 15 2003 - 22:00:30 EST