Re: Linux scheduler (fwd)

Adam McKee (amckee@poboxes.com)
Wed, 18 Feb 1998 16:07:26 -0600 (CST)


I am currently hashing up a version of the scheduler that prevents
starvation, since I have come to believe it is unacceptable to end up
rebooting your machine just because of an errant program or a misguided
re-nicing by the super-user. I used to think that the solution for this
was to leave an "emergency" getty running on the machine at run-queue 0,
but this is not a very practical or user-friendly solution.

The approach I am taking to prevent starvation is intentionally simple:
tasks can, due to starvation, be promoted beyond their "minimum"
run-queue, all the way to run-queue 0. Sooner or later, they will be
awarded a timeslice by the scheduler, and when that is used up, they will
be returned to their minimum run-queue.

Just a few comments:

> *NEW* Every level down, the length of the timeslice
doubles, > so a 10% CPU task will remain 'stuck' at a certain level, and
> thus it will _always_ have a higher priority than any idle-eater,
> and have near-realtime performance (compared to idle-eaters).
> This might be a good way to get more-than-just-reasonable performance
> for games and mod-players on multiuser machines. This won't
> reduce the number of context switches, since a long-slice
> process _will_ be interrupted by high-priority processes. It's
> just that after the interruption the system will continue with
> the same low-prio task, and not with it's competitor...

Such drastically varied timeslices lengths are likely to have very
unwelcome consequences.

> *NEW* Tasks get promoted above their maximum priority, when
> they suffer from starvation, but this is for one full timeslice
> only... OTOH, when it never finishes a timeslice, it might just
> remain in the high queue (so a +20 niced vi still get's good
> performance :-) For this the max-wait might be multiplied
> by the nice level the process is on.

This is what I've done, but I immediately return the process to its
minimum run-queue regardless of whether or not it uses its entire slice.
The whole aim here is to allow the super-user to regain control of the
machine, and prevent processes on run-queues > 0 from being totally
starved.

In writing and maintaining this patch, I've gained appreciation of two
things with respect to kernel scheduling algorithms:

o it's easy to design beautiful solutions for specific scheduling
situations, but the real art is to come up with an algorithm
that achieves good all-around performance, and doesn't totally
break down in "un-anticipated" scheduling scenarios.

o a good scheduling algorithm doesn't allow user-space code to
"trick" it in order to obtain significantly more CPU time.

For example, if we allow processes to remain below their minimum run-queue
provided they don't ever use up an entire timeslice, then a malevolent
program could then intentionally try to always use 95% of its timeslice,
but not the whole thing so as to avoid ever being sent back to its minimum
run-queue. For a process like this, starvation could be the best thing
that ever happened to it... a one-way ticket to run-queue 0 :-)

-------------------------------------------------------------------------------
Adam McKee phone: (306) 343-0881 (home)
(306) 933-1544 (work)
Programmer/Analyst (306) 956-4070 (voice-mail/pager)
SED Systems, Inc. email: amckee@poboxes.com
===============================================================================
Just say NO to IE 4.0. <http://www.sun.com/announcement/>
===============================================================================

On Tue, 17 Feb 1998, Rik van Riel wrote:

> Thanks for using NetForward!
> http://www.netforward.com
> v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v
>
> On Mon, 16 Feb 1998, Andrew J. Anderson wrote:
>
> > "quantum expiration policy" that reduces your CPU window if
> > you get an involuntary context switch -- this keeps CPU hogs from choking
> > the machine.
> >
> > "sleep return priority" that is generally higher than the current priorty
> > (this keeps interactive performance good) -- the idea being that the
> > process blocked on IO and is now ready to do something.
> >
> > "maximum wait" that defines how long a process will wait on the current
> > run queue level before being escalated to a higher level. (This is how
> > Solaris addresses starvation.)
> >
> > "long wait priority" that handles nice'd processes to make sure that they
> > get some CPU time. This is roughly double the maximum wait by default.
>
> I've been suggesting something like this to the
> QNX-sched author as well. But in a more non-tunable
> way. It lacked the OS/2 like maximum wait, however.
>
> I can remember something like this from the linux-1.2
> days (kswap reduced the timeslice to 4*HZ/loadavg to
> have better scheduling).
>
> Here are two scheduling algorithms that might be
> worthwhile for Linux. Personally, I prefer the second
> one, but the first one might be implemented with the
> least change to the current kernel.
>
> --------------> BSD-and-OSF-like-algoritm <-----------
>
> I think I (or we?) really should look into this and provide:
> - on-the-fly priority changes (not just when all timeslices
> are finished) a-la BSD
> - max-wait extra-extra bonus. (nice +20 --> max-wait *= 20 ??)
> - SCHED_BG extra-low-priority without max-wait
> - realy-low latency for interactive processes !!!!! Solaris
> and OS/2 haven't done this properly, thereby earning their
> nicknames Slowlaris and 'half an OS'.
> - very long timeslices for processes niced >= 10, so their
> influence on the VM subsystem is smaller than it is now
>
> -----------> QNX-and-CTSS-like-algoritm <----------------
> A-la QNX, a process in the highest priority ring is
> executed and demoted once it's timeslice is over (if
> it uses it's entire timeslice).
>
> Only when no processes remain up-there, the scheduler
> moves down to the lower queues.
>
> QNX-sched like demotions / promotions.
>
> *NEW* When a task reaches the max-wait value, it gets
> (for one slice) promoted to real-time priority, _and_
> the _system-wide_ timeslice length gets halved for 5
> seconds, so the system can overcome temporary extra
> loads.
>
> *NEW* Introduction of a SCHED_BG task without a max-wait
> value. There's no way to rescue your vi from this... So
> maybe we shouldn't introduce this feature :-)
>
> *NEW* Euhmm, what did I forget?
>
>
> The existing QNX-sched patch is probably a good base
> for the second approach. Somebody just might code it
> up in half a week :-)
>
> Rik.
> +-----------------------------+------------------------------+
> | For Linux mm-patches, go to | "I'm busy managing memory.." |
> | my homepage (via LinuxHQ). | H.H.vanRiel@fys.ruu.nl |
> | ...submissions welcome... | http://www.fys.ruu.nl/~riel/ |
> +-----------------------------+------------------------------+
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu