Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expiredwith a single array

From: Peter Williams
Date: Sat May 29 2004 - 19:20:04 EST


Con Kolivas wrote:
On Sat, 29 May 2004 15:27, Peter Williams wrote:

Con Kolivas wrote:
> On Fri, 28 May 2004 19:24, Peter Williams wrote:
> > Ingo Molnar wrote:
> > > just try it - run a task that runs 95% of the time and sleeps 5%
> > > of the time, and run a (same prio) task that runs 100% of the
> > > time. With the current scheduler the slightly-sleeping task gets
> > > 45% of the CPU, the looping one gets 55% of the CPU. With your
> > > patch the slightly-sleeping process can easily monopolize 90% of
> > > the CPU!
> >
> > This does, of course, not take into account the interactive bonus.
> > If the task doing the shorter CPU bursts manages to earn a larger
> > interactivity bonus than the other then it will get more CPU but
> > isn't that the intention of the interactivity bonus?
>
> No. Ideally the interactivity bonus should decide what goes first
> every time to decrease the latency of interactive tasks, but the cpu
> percentage should remain close to the same for equal "nice" tasks.

There are at least two possible ways of viewing "nice": one of these is
that it is an indicator of the tasks entitlement to CPU resource (which
is more or less the view you describe) and another that it is an
indicator of the task's priority with respect to access to CPU resources.

If you wish the system to take the first of these views then the
appropriate solution to the scheduling problem is to use an entitlement
based scheduler such as EBS (see
<http://sourceforge.net/projects/ebs-linux/>) which is also much simpler
than the current O(1) scheduler and has the advantage that it gives
pretty good interactive responsiveness without treating interactive
tasks specially (although some modification in this regard may be
desirable if very high loads are going to be encountered).

If you want the second of these then this proposed modification is a
simple way of getting it (with the added proviso that starvation be
avoided).

Of course, there can be other scheduling aims such as maximising
throughput where different scheduler paradigms need to be used. As a
matter of interest these tend to have not very good interactive response.

If the system is an interactive system then all of these models (or at
least two of them) need to be modified to "break the rules" as far as
interactive tasks are concerned and give them higher priority in order
not to try human patience.

> Interactive tasks need low scheduling latency and short bursts of high
> cpu usage; not more cpu usage overall. When the cpu percentage

differs > significantly from this the logic has failed.

The only way this will happen is if the interactive bonus mechanism
misidentifies a CPU bound task as an interactive task and gives it a
large bonus. This seems to be the case as tasks with a 95% CPU demand
rate are being given a bonus of 9 (out of 10 possible) points.


This is all a matter of semantics and I have no argument with it.

I think your aims of simplifying the scheduler are admirable but I hope you don't suffer the quagmire that is manipulating the interactivity stuff.

As you surmise, this patch is just a starting point and there are some parts of it the may need to be fine tuned.

For instance, the current time slice used is set at the average that the current mechanism would have dispensed. Making this smaller would lessen the severity of the anomaly under discussion but making it too small would increase the context switch rate. There is evidence from our kernbench results that we have room to decrease this value and still keep the context switch rate below that of the current scheduler (at least, for normal to moderately heavy loads). If possible I'd like to get some statistics on the sleep/wake cycles of tasks on a typical system to help make a judgment about what is the best value here.

Another area that needs more consideration is the determination of the promotion interval. At the moment, there's no promotion if there's less than 2 runnable tasks on a CPU and the interval is a constant multiplied by the number of runnable tasks otherwise.

Another area of investigation is (yet another) bonus intended to increase system throughput by minimizing (or at least attempting to) the time tasks spend on the run queues. The principal difficulty here is making sure that this doesn't adversely effect interactive responsiveness as it's an unfortunate fact of life that what's good for interactive response isn't necessarily (and usually isn't) good for maximizing throughput and vice versa.

Then, the interactive bonus mechanism might be examined but this is of low priority as the current one seems to do a reasonable job.

Lastly, with the simplification of the scheduler I believe that it would be possible to make both the interactive response and throughput bonuses optional. An example of why this MIGHT BE desirable is that the interactive response bonus adversely effects throughput and turning it off on servers where there are no interactive users may be worthwhile.

Changing one value and saying it has no apparent effect is almost certainly wrong; surely it was put there for a reason - or rather I put it there for a reason.

Out of interest, what was the reason? What problem were you addressing?

Peter
--
Dr Peter Williams pwil3058@xxxxxxxxxxxxxx

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/