Re: scheduler oddity [bug?]

From: Ingo Molnar
Date: Sun Mar 08 2009 - 14:55:44 EST



* Mike Galbraith <efault@xxxxxx> wrote:

> On Sun, 2009-03-08 at 18:52 +0100, Ingo Molnar wrote:
> > * Mike Galbraith <efault@xxxxxx> wrote:
> >
> > > On Sun, 2009-03-08 at 16:39 +0100, Ingo Molnar wrote:
> > > > * Mike Galbraith <efault@xxxxxx> wrote:
> > > >
> > > > > The problem with your particular testcase is that while one
> > > > > half has an avg_overlap (what we use as affinity hint for
> > > > > synchronous wakeups) which triggers the affinity hint, the
> > > > > other half has avg_overlap of zero, what it was born with, so
> > > > > despite significant execution overlap, the scheduler treats
> > > > > them as if they were truly synchronous tasks.
> > > >
> > > > hm, why does it stay on zero?
> > >
> > > Wakeup preemption. Presuming here: heavy task wakes light
> > > task, is preempted, light task stuffs data into pipe, heavy
> > > task doesn't block, so no avg_overlap is ever computed. The
> > > heavy task uses 100% CPU.
> > >
> > > Running as SCHED_BATCH (virgin source), it becomes sane.
> >
> > ah.
> >
> > I'd argue then that time spent on the rq preempted _should_
> > count in avg_overlap statistics. I.e. couldnt we do something
> > like ... your patch? :)
> >
> > > > if (sleep && p->se.last_wakeup) {
> > > > update_avg(&p->se.avg_overlap,
> > > > p->se.sum_exec_runtime - p->se.last_wakeup);
> > > > p->se.last_wakeup = 0;
> > > > - }
> > > > + } else if (p->se.avg_overlap < limit && runtime >= limit)
> > > > + update_avg(&p->se.avg_overlap, runtime);
> >
> > Just done unconditionally, i.e. something like:
> >
> > if (sleep) {
> > runtime = p->se.sum_exec_runtime - p->se.last_wakeup;
> > p->se.last_wakeup = 0;
> > } else {
> > runtime = p->se.sum_exec_runtime - p->se.prev_sum_exec_runtime;
> > }
> >
> > update_avg(&p->se.avg_overlap, runtime);
> >
> > ?
>
> That'll do it for this load. I'll resume in the a.m., give
> that some testing, and try to remember all the things I was
> paranoid about.

btw., there's room for a cleanup + micro-optimization here too:
it would be nice to change se.last_wakeup and
se.prev_sum_exec_runtime to be the same variable,
se.prev_timestamp or so.

That way we can do a simple:

update_avg(&p->se.avg_overlap,
p->se.sum_exec_runtime - p->se.prev_timestamp);
p->se.prev_timestamp = 0;

the latter is needed as we rely on the zeroing here:

kernel/sched.c: if (sleep && p->se.last_wakeup) {


Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/