Re: [RFC][PATCH] 2.5.0 Multi-Queue Scheduler

From: Alan Cox (
Date: Sat Dec 08 2001 - 06:58:22 EST

> maintain the behavior of the existing scheduler. This was both
> good and bad. The good is that we did not need to be concerned
> with load balancing like traditional multi-queue schedulers.

Which version of the scheduler do you behave like ?

> balancing. This is what Davide Libenzi is working on and
> something I plan to tackle next. If you have any ideas on
> areas where our time would be better spent, I would be happy
> to hear them.

I've played with your code, and some other ideas too. I don't think the
Linus scheduler is salvagable. Its based upon assumptions about access to
data and scaling that broke when the pentium came out let alone the quad

I have the core bits of a scheduler that behaves roughly like Linux 2.4 with
the recent Ingo cache change [in fact that came from working out what the
Linus scheduler does].

Uniprocessor scheduling is working fine (I've not tackled the RT stuff tho)
SMP I'm pondering bits still.

Currently I do the following

Two sets of 8 queues per processor, and a bitmask of non empty queues.

Picking a new task to run is as simple as
        new = cpu->run_queue[fastffz[cpu->runnable]];

(fastffz is a lookup table, for more queues you want a real ffz which is
 a few clocks more but might be worth it on very big boxen)

sleep simply doesn't put the task back on a run queue

wake_up (uniprocessor) is

        if(task->epoch != epoch)
        queue_task(cpu, task);
        if(test_and_set_bit(task->pri, cpu->runnable))

Running out of ticks (as with the original Linus scheduler - Ingo edition)
bumps you off run_queue onto next_queue. run/next queue exchange when there
is nothing left to run, but next_queue has stuff (ie non idle). At that
point we switch them and bump epoch.

epoch is a trick I took from the montavista patches - instead of iterating
every task in the system to adjust priority, we work out what it would have
been by adjusting at wake up.

I don't see how to get an O(roughly 1) scheduler without changing behaviour.
At the same time I agree with Linus that a lot of boxes typically have < 3
runnable tasks so we can't make the simple case suffer.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Sat Dec 15 2001 - 21:00:12 EST