Re: [ANNOUNCE/RFC] Really Simple Really Fair Scheduler

From: Ingo Molnar
Date: Tue Sep 04 2007 - 02:30:28 EST



* Roman Zippel <zippel@xxxxxxxxxxxxxx> wrote:

> > > It's a variation of the sleeper bonus. [...]
> >
> > hm, where are its effects described in your explanation? Seems like a
> > key item.
>
> It has no direct effect on the correctness of the mathematical model,
> the time is initialized before the time is added to the model. What
> you're after is the effect on the behaviour, which wasn't my focus, so
> sorry, I didn't mention it.

and what about the mirror image problem? Your math assumes that tasks
use up their full timeslices, somewhere starting at (12):

| (12) time_norm_app = sum_{t}^{T}(time_norm_{t} * weight_{t}) /
| sum_{t}^{T}(weight_{t})
|
| This produces only a approximate normalized time, but if all
| time_norm_{t} are equal (i.e. all tasks got their share), the result
| is the same, thus the error is only temporary.

>From here on the reasoning and the math is flawed i believe: it assumes
that tasks use up their timeslices - which is rarely the case.

In practical terms: your code causes unfairness to tasks that sleep when
they have used up only a fraction of their slice, when they are in a
fairness-deficit. For example consider a task that just waited alot to
get on the CPU, and when it finally got there (gathering a fairness
deficit of say 9 milliseconds), a hundred microseconds later it sleeps.

The problem is that when it wakes up and gets back on the runqueue your
code "forgets" that the task is in "need of fairness" by 9 milliseconds:
the task continues as if its previous starvation didnt happen at all.
This effect can cause _far more noise_ and even systematic starvation
than any numeric rounding effects could cause. (This could perhaps
explain the unfairness Mike reported, and this could perhaps explain the
noisy hackbench results i'm seeing with your patch - although i'm not
certain about this, i havent looked into those usecases in detail.)

CFS's current math addresses this problem via the use of the fair-clock
metric and via ->wait_runtime: that gives us a good idea what happened
to the system while the task slept. With your model, that information is
not maintained at all, and is not regainable. At the moment i dont see
how we could achieve good sleeper behavior with your model, you've
over-simplified the scheduler metrics and we've lost essential
information.

That's why i'm concentrating on the basic scheduling properties first,
without considering nice levels, rounding or optimizations - if that
basic model does not fit then a scheduler cannot work. That's also why
i've asked for a split-up patch series - it makes it far easier to
review and test the code and it makes it far easier to quickly apply the
obviously correct bits.

And i'd like to concur with Tong Li's observation that currently CFS is
a very generic fair scheduler upon which a multitude of scheduling
variants can be built and prototyped (we use a specific one right now,
but it's not cast into stone). The 30-lines-changed patch i sent shows
this property of CFS pretty well - and it already demonstrates the
issues we are talking about here.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/