Re: [RFC PATCH 00/11] sched: CFS low-latency features

From: Mathieu Desnoyers
Date: Thu Aug 26 2010 - 19:49:17 EST


* Peter Zijlstra (peterz@xxxxxxxxxxxxx) wrote:
> On Thu, 2010-08-26 at 14:09 -0400, Mathieu Desnoyers wrote:
> > Feedback is welcome,
> >
> So we have the following components to this patch:
>
> - dynamic min_vruntime -- push the min_vruntime ahead at the
> rate of the runqueue wide virtual clock. This approximates
> the virtual clock, esp. when turning off sleeper fairness.
> And is cheaper than actually computing the virtual clock.
>
> It allows for better insertion and re-weighting behaviour,
> but it does increase overhead somewhat.
>
> - special wakeups using the next-buddy to get scheduled 'soon',
> used by all wakeups from the input system and timers.
>
> - special fork semantics related to those special wakeups.
>
>
> So while I would love to simply compute the virtual clock, it would add
> a s64 mult to every enqueue/dequeue and a s64 div to each
> enqueue/re-weight, which might be somewhat prohibitive, the dyn
> min_vruntime approximation seems to work well enough and costs a u32 div
> per enqueue.

Yep, it's cheap enough and seems to work very well as far as my testing have
shown.

> Adding a preference to all user generated wakeups (input) and
> propagating that state along the wakeup chain seems to make sense,

Yes, this is what lets us kill FAIR_SLEEPERS (and thus let the dynamic
min_vruntime behave as expected), while keeping good Xorg interactivity.

> adding the same to all timers is something that needs to be discussed, I
> can well imagine not all timers are equally important -- do we want to
> extend the timer interface?

I just thought it made sense that when a timer fires and wakes up a thread,
there are pretty good chances that we might to wakeup this thread quickly. But
it brings the question in a more general sense: would we want this kind of
behavior also available for network packets, disk I/O, etc ? IOW, would it make
sense to have next-buddy selection on all these input-triggered wakeups ? Since
we're only using the next buddy selection, it will only perform this selection
if the selected buddy is within a specific vruntime range from the minimum, so
AFAIK, I don't think we would end up starving the system in any possible way.

So far I cannot see a situation where selecting the next buddy would _not_ make
sense in any kind of input-driven wakeups (interactive, timer, disk, network,
etc). But maybe it's just a lack of imagination on my part.


> If we do decide we want both, we should at the very least merge the
> try_to_wake_up() conditional blob (they're really identical). Preferably
> we should reduce ttwu(), not add more to it...

Sure. Or maybe find out we want to target even more input paths... so far
interactive input seemed absolutely needed, timers seemed logical and
self-rate-limited. For the others, I don't know. It might actually make sense
too.

>
> Fudging fork seems dubious at best, it seems generated by the use of
> timer_create(.evp->sigev_notify = SIGEV_THREAD), which is a really
> broken thing to do, it has very ill defined semantics and is utterly
> unable to properly cope with error cases. Furthermore its trivial to
> actually correctly implement the desired behaviour, so I'm really
> skeptical on this front; friends don't let friends use SIGEV_THREAD.

Agreed for the timer-tied case. I think the direction Thomas proposes for fixing
glibc makes much more sense. However, I am wondering about the interactive case
too: e.g., if you click on "open terminal", it needs to fork a process and you
would normally expect this to come up more quickly than the background kernel
compile you're doing. So there might be some interest for this fork vruntime
boost for interactivity-driven wakeups. Thoughts ?

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/