Re: [ANNOUNCE/RFC] Really Fair Scheduler

From: Ingo Molnar
Date: Fri Aug 31 2007 - 06:55:14 EST



* Roman Zippel <zippel@xxxxxxxxxxxxxx> wrote:

> Hi,
>
> I'm glad to announce a working prototype of the basic algorithm I
> already suggested last time. As I already tried to explain previously
> CFS has a considerable algorithmic and computational complexity. [...]

hey, thanks for working on this, it's appreciated! In terms of merging
your stuff, your patch looks a bit large and because you did not tell us
that you were coding in this area, you probably missed Peter Zijlstra's
excellent CFS work:

http://programming.kicks-ass.net/kernel-patches/sched-cfs/

The following portion of Peter's series does much of the core math
changes of what your patch provides (and which makes up for most of the
conceptual delta in your patch), on a step by step basis:

sched-update_weight_inv.patch
sched-se_load.patch
sched-se_load-2.patch
sched-64-wait_runtime.patch
sched-scaled-limit.patch
sched-wait_runtime-scale.patch
sched-calc_delta.patch

So the most intrusive (math) aspects of your patch have been implemented
already for CFS (almost a month ago), in a finegrained way.

Peter's patches change the CFS calculations gradually over from
'normalized' to 'non-normalized' wait-runtime, to avoid the
normalizing/denormalizing overhead and rounding error. Turn off sleeper
fairness, remove the limit code and we should arrive to something quite
close to the core math in your patch, with similar rounding properties
and similar overhead/complexity. (there are some other small details in
the math but this is the biggest item by far.) I find Peter's series
very understandable and he outlined the math arguments in his replies to
your review mails. (would be glad to re-open those discussions though if
you still think there are disagreements.)

Peter's work fully solves the rounding corner-cases you described as:

> This model is far more accurate than CFS is and doesn't add an error
> over time, thus there are no more underflow/overflow anymore within
> the described limits.

( your characterisation errs in that it makes it appear to be a common
problem, while in practice it's only a corner-case limited to extreme
negative nice levels and even there it needs a very high rate of
scheduling and an artificially constructed workload: several hundreds
of thousand of context switches per second with a yield-ing loop to be
even measurable with unmodified CFS. So this is not a 2.6.23 issue at
all - unless there's some testcase that proves the opposite. )

with Peter's queue there are no underflows/overflows either anymore in
any synthetic corner-case we could come up with. Peter's queue works
well but it's 2.6.24 material.

Non-normalized wait-runtime is simply a different unit (resulting in
slightly higher context-switch performance), the principle and the end
result does not change.

All in one, we dont disagree, this is an incremental improvement we are
thinking about for 2.6.24. We do disagree with this being positioned as
something fundamentally different though - it's just the same thing
mathematically, expressed without a "/weight" divisor, resulting in no
change in scheduling behavior. (except for a small shift of CPU
utilization for a synthetic corner-case)

And if we handled that fundamental aspect via Peter's queue, all the
remaining changes you did can be done (and considered and merged)
evolutionarily instead of revolutionarily, ontop of CFS - this should
cut down on the size and the impact of your changes significantly!

So if you'd like to work with us on this and get items that make sense
merged (which we'd very much like to see happen), could you please
re-shape the rest of your changes and ideas (such as whether to use
ready-queueing or a runqueue concept, which does look interesting) ontop
of Peter's queue, and please do it as a finegrained, reviewable,
mergable series of patches, like Peter did. Thanks!

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/