Re: CFS review

From: Al Boldi
Date: Wed Aug 29 2007 - 00:20:35 EST


Ingo Molnar wrote:
> * Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > On Tue, 28 Aug 2007, Al Boldi wrote:
> > > I like your analysis, but how do you explain that these stalls
> > > vanish when __update_curr is disabled?
> >
> > It's entirely possible that what happens is that the X scheduling is
> > just a slightly unstable system - which effectively would turn a small
> > scheduling difference into a *huge* visible difference.
>
> i think it's because disabling __update_curr() in essence removes the
> ability of scheduler to preempt tasks - that hack in essence results in
> a non-scheduler. Hence the gears + X pair of tasks becomes a synchronous
> pair of tasks in essence - and thus gears cannot "overload" X.

I have narrowed it down a bit to add_wait_runtime.

Patch 2.6.22.5-v20.4 like this:

346- * the two values are equal)
347- * [Note: delta_mine - delta_exec is negative]:
348- */
349:// add_wait_runtime(cfs_rq, curr, delta_mine - delta_exec);
350-}
351-
352-static void update_curr(struct cfs_rq *cfs_rq)

When disabling add_wait_runtime the stalls are gone. With this change the
scheduler is still usable, but it does not constitute a fix.

Now, even with this hack, uneven nice-levels between X and gears causes a
return of the stalls, so make sure both X and gears run on the same
nice-level when testing.

Again, the whole point of this workload is to expose scheduler glitches
regardless of whether X is broken or not, and my hunch is that this problem
looks suspiciously like an ia-boosting bug. What's important to note is
that by adjusting the scheduler we can effect a correction in behaviour, and
as such should yield this problem as fixable.

It's probably a good idea to look further into add_wait_runtime.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/