Re: [RFC] [PATCH] Pre-emption control for userspace

From: Peter Zijlstra
Date: Thu Mar 06 2014 - 04:58:17 EST


On Wed, Mar 05, 2014 at 12:58:29PM -0700, Khalid Aziz wrote:
> On 03/05/2014 04:10 AM, Peter Zijlstra wrote:
> >On Tue, Mar 04, 2014 at 04:51:15PM -0800, Andi Kleen wrote:
> >>Anything else?
> >
> >Proxy execution; its a form of PI that works for arbitrary scheduling
> >policies (thus also very much including fair).
> >
> >With that what you effectively end up with is the lock holder running
> >'boosted' by the runtime of its blocked chain until the entire chain
> >runs out of time, at which point preemption doesn't matter anyhow.
> >
>
> Hello Peter,
>
> I read through the concept of proxy execution and it is a very interesting
> concept. I come from many years of realtime and embeddded systems
> development and I can easily recall various problems in the past that can be
> solved or helped by this.

Yeah; there's a few nasty cases with PEP on SMP though (the reason we've
not already have it); the trivial implementation that works wonderfully
on UP ends up capable of running the same task on multiple CPUs -- which
is an obvious fail.

There's people working on this though; as the scheme also works well for
the recently added deadline scheduler.

> Looking at the current problem I am trying to
> solve with databases and JVM, I run into the same issue I described in my
> earlier email. Proxy execution is a post-contention solution. By the time
> proxy execution can do something for my case, I have already paid the price
> of contention and a context switch which is what I am trying to avoid. For a
> critical section that is very short compared to the size of execution
> thread, which is the case I am looking at, avoiding preemption in the middle
> of that short critical section helps much more than dealing with lock
> contention later on.

Like others have already stated; its likely still cheaper than the
pile-up you get now. It might not be optimally fast, but it sure takes
out the worst case you have now.

> The goal here is to avoid lock contention and
> associated cost. I do understand the cost of dealing with lock contention
> poorly and that can easily be much bigger cost, but I am looking into
> avoiding even getting there.

The thing is; unless userspace is a RT program or practises the same
discipline in such an extend as that it make no practical difference,
there's always going to be the case where you fail to cover the entire
critical section, at which point you're back to your pile-up fail.

So while the limited preemption guard helps the best cast, it doesn't
help the worst case at all.

So supposing we went with this now; you (or someone else) will come back
in a year's time and tell us that if we only just stretch this window a
little, their favourite workload will also benefit.

Where's the end of that?

And what about CONFIG_HZ; suppose you compile your kernel with HZ=100
and your 1 extra tick is sufficient. Then someone compiles their kernel
with HZ=1000 and it all comes apart.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/