[patch] SMP scheduler improvements

Andrea Arcangeli (andrea@suse.de)
Thu, 19 Aug 1999 16:42:25 +0200 (CEST)


This patch improves a lot the SMP scheduler. I had some feedback from
people about it (you'll get some numbers also on linux kernel in the
followup of the "SMP scheduler" thread of last week).

Basically I check if the preferred CPU is idle. If so I reschedule it. If
it's not idle I check if there is some idle CPU and our avgslice is long
enough. If so I reschedule such idle CPU. If our avgslice is not long
enough or if there aren't idle CPUs, I try to reschedule my
preferred-not-idle CPU looking its goodness. If it's not possible to
reschedule the best-CPU then I reschedule a random idle CPU... (if there
is at least an idle cpu).

The old code has places that make no sense at all (to me) as I complained
since the pre-2.2.8-1.

My algorithm seems to work great in practice (sure it's 100% stable).

Note that this patch needs current->processor to be always a valid value
(not NO_PROC_ID).

This patch will apply cleanly against 2.3.14-pre2.

--- 2.2.11/kernel/sched.c Tue Jul 13 00:33:10 1999
+++ 2.2.11-sched/kernel/sched.c Thu Aug 12 01:33:24 1999
@@ -211,69 +211,14 @@
return goodness(prev, p, cpu) - goodness(prev, prev, cpu);
}

-/*
- * If there is a dependency between p1 and p2,
- * don't be too eager to go into the slow schedule.
- * In particular, if p1 and p2 both want the kernel
- * lock, there is no point in trying to make them
- * extremely parallel..
- *
- * (No lock - lock_depth < 0)
- *
- * There are two additional metrics here:
- *
- * first, a 'cutoff' interval, currently 0-200 usecs on
- * x86 CPUs, depending on the size of the 'SMP-local cache'.
- * If the current process has longer average timeslices than
- * this, then we utilize the idle CPU.
- *
- * second, if the wakeup comes from a process context,
- * then the two processes are 'related'. (they form a
- * 'gang')
- *
- * An idle CPU is almost always a bad thing, thus we skip
- * the idle-CPU utilization only if both these conditions
- * are true. (ie. a 'process-gang' rescheduling with rather
- * high frequency should stay on the same CPU).
- *
- * [We can switch to something more finegrained in 2.3.]
- *
- * do not 'guess' if the to-be-scheduled task is RT.
- */
-#define related(p1,p2) (((p1)->lock_depth >= 0) && (p2)->lock_depth >= 0) && \
- (((p2)->policy == SCHED_OTHER) && ((p1)->avg_slice < cacheflush_time))
-
-static inline void reschedule_idle_slow(struct task_struct * p)
+static void reschedule_idle(struct task_struct * p)
{
#ifdef __SMP__
-/*
- * (see reschedule_idle() for an explanation first ...)
- *
- * Pass #2
- *
- * We try to find another (idle) CPU for this woken-up process.
- *
- * On SMP, we mostly try to see if the CPU the task used
- * to run on is idle.. but we will use another idle CPU too,
- * at this point we already know that this CPU is not
- * willing to reschedule in the near future.
- *
- * An idle CPU is definitely wasted, especially if this CPU is
- * running long-timeslice processes. The following algorithm is
- * pretty good at finding the best idle CPU to send this process
- * to.
- *
- * [We can try to preempt low-priority processes on other CPUs in
- * 2.3. Also we can try to use the avg_slice value to predict
- * 'likely reschedule' events even on other CPUs.]
- */
int this_cpu = smp_processor_id(), target_cpu;
struct task_struct *tsk, *target_tsk;
- int cpu, best_cpu, weight, best_weight, i;
+ int cpu, best_cpu, weight, i;
unsigned long flags;

- best_weight = 0; /* prevents negative weight */
-
spin_lock_irqsave(&runqueue_lock, flags);

/*
@@ -289,15 +234,17 @@
for (i = 0; i < smp_num_cpus; i++) {
cpu = cpu_logical_map(i);
tsk = cpu_curr(cpu);
- if (related(tsk, p))
- goto out_no_target;
- weight = preemption_goodness(tsk, p, cpu);
- if (weight > best_weight) {
- best_weight = weight;
+ if (tsk == idle_task(cpu))
target_tsk = tsk;
- }
}

+ if (target_tsk && p->avg_slice > cacheflush_time)
+ goto send_now;
+
+ tsk = cpu_curr(best_cpu);
+ if (preemption_goodness(tsk, p, best_cpu) > 0)
+ target_tsk = tsk;
+
/*
* found any suitable CPU?
*/
@@ -326,35 +273,6 @@
if (preemption_goodness(tsk, p, this_cpu) > 0)
tsk->need_resched = 1;
#endif
-}
-
-static void reschedule_idle(struct task_struct * p)
-{
-#ifdef __SMP__
- int cpu = smp_processor_id();
- /*
- * ("wakeup()" should not be called before we've initialized
- * SMP completely.
- * Basically a not-yet initialized SMP subsystem can be
- * considered as a not-yet working scheduler, simply dont use
- * it before it's up and running ...)
- *
- * SMP rescheduling is done in 2 passes:
- * - pass #1: faster: 'quick decisions'
- * - pass #2: slower: 'lets try and find a suitable CPU'
- */
-
- /*
- * Pass #1. (subtle. We might be in the middle of __switch_to, so
- * to preserve scheduling atomicity we have to use cpu_curr)
- */
- if ((p->processor == cpu) && related(cpu_curr(cpu), p))
- return;
-#endif /* __SMP__ */
- /*
- * Pass #2
- */
- reschedule_idle_slow(p);
}

/*

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/