[patch] O(1) scheduler, -I0

From: Ingo Molnar (mingo@elte.hu)
Date: Tue Jan 15 2002 - 11:04:29 EST


the -I0 patch is available at:

    http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.5.2-final-I0.patch

stock 2.5.2 includes a 'interactivity estimator' method that includes most
of the things i think to be important for good interactivity:

 - sleep time based priority boost/penalty.

 - constant frequency runqueue sampling instead of recalculation/switch
   based runqueue sampling.

 - interactivity based runqueue insertion on timeslice expire.

I'm very happy about the 2.5.2 solution, it's simpler than the one i used
in -H7 - good work Davide!

There are a number of problems in 2.5.2 that need fixing though:

 - renicing is broken - it does not work at all, neither up nor down, for
   CPU-bound tasks. Renicing fell victim to the attempt to penalize CPU
   hogs as much as possible: every CPU-bound task reaches the lowest
   priority level and stays there. This also makes kernel compile times
   suffer.

 - RT scheduling is broken.

 - the sleep average is hidden in p->prio, which makes it harder to
   recover and use the true interactiveness of the task.

 - the runqueue is sampled at a frequency of 20 HZ, which can misdetect
   periodic user tasks that somehow correlate with 20 HZ.

I've fixed these problems/bugs by taking some of the -H7 solutions:

 - introducing p->sleep_avg, which is updated in a lightweight way. No
   more 'history slots'. A single counter, updated in a very simple way.

 - limiting the bonus/penalty range according to nice levels - a task can
   at most get a 5 priority levels penalty over the default level, in
   stock 2.5.2 it can get to the nice +19 level after a few seconds
   runtime. Nice levels work again.

 - introducing HZ frequency runqueue sampling. Also the MAX_SLEEP_AVG
   constant tells us how long into the past we are looking. This is 2
   seconds right now.

 - separating the RT timeslice code in scheduler_tick(), we used to break
   the RT case way too often, now we can hack the SCHED_OTHER code without
   having to touch the RT part.

 - plus the patch also includes all the fixes and improvements from the
   -H7 patch.

i've also cleaned up and commented the priority management code and have
introduced the prio_effective(p) inline function.

i've tested the patch on UP and SMP boxes. I've measured high-load
interactivity to be on equivalent levels with that of stock 2.5.2.

Bug reports, comments, suggestions welcome.

        Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Jan 15 2002 - 21:00:51 EST