Re: [RFC] [PATCH] Pre-emption control for userspace

From: David Lang
Date: Wed Mar 05 2014 - 19:01:49 EST


On Wed, 5 Mar 2014, Khalid Aziz wrote:

On 03/05/2014 04:13 PM, David Lang wrote:
Yes, you have paid the cost of the context switch, but your original
problem description talked about having multiple other threads trying to
get the lock, then spinning trying to get the lock (wasting time if the
process holding it is asleep, but not if it's running on another core)
and causing a long delay before the process holding the lock gets a
chance to run again.

Having the threads immediately yield to the process that has the lock
reduces this down to two context switches, which isn't perfect, but it's
a LOT better than what you started from.

OK, let us consider the multiple core scenario:

- Thread A gets scheduled on core 1, runs for 95% of its timeslice before it gets to its critical section.
- Thread A grabs the lock and quickly reaches the end of its timeslice before finishing its critical section.
- Thread A is preempted on core 1 by a completely unrelated thread.
- Thread B in the mean time is scheduled on core 2 and happens to get to its critical section right away where it tries to grab the lock held by thread A. It spins for a bit waiting to see if lock becomes available, gives up and yields to next process in queue.
- Since thread A ran recently, it is now stuck towards the end of run queue, so thread C gets to run on core 2 which goes through same fate as thread A.

Now scale this scenario across more cores and more threads that all want the same lock to execute their small critical section. This cost of spinning and context switch could have been avoided if thread A could get additional timeslice to complete its critical section. Yielding to the process holding the lock happens only after contention has happened and we have paid the price of two context switches for the lock owner. When yield_to() happens, the lock owner may not get to run on the core it was on because an unrelated thread is running on that core and it needs to wait for that thread's timeslice to run out. If the lock owner gets scheduled on another core, we pay the price of repopulating the cache for a new thread on that core. yield_to() is better than having a convoy building up of processes waiting for the same lock but worse than avoiding the contention altogether.

with yield_to(), when thread B hit the contention, thread A would get more time an be able to complete immediatly, so by the time thread C gets around to running, there is no longer any contention.


well, writing to something in /proc isn't free either. And how is the
thread supposed to know if it needs to do so or if it's going to have
enough time to finish it's work before it's out of time (how can it know
how much time it would have left anyway?)

Cost is writing to a memory location since thread is using mmap, not insignificant but hardly expensive. Thread does not need to know how much time it has left in current timeslice. It always sets the flag to request pre-emption immunity before entering the critical section and clears the flag when it exits its critical section. If the thread comes up for pre-emption while the flag is set, it gets immunity. If it does not, flag will be cleared at the end of critical section any way.

what's the cost to setup mmap of this file in /proc. this is sounding like a lot of work.

is this gain from not giving up the CPU at all? or is it from avoiding
all the delays due to the contending thread trying in turn? the
yield_to() approach avoids all those other threads trying in turn so it
should get fairly close to the same benefits.


The gain is from avoiding contention by giving locking thread a chance to complete its critical section which is expected to be very short (certainly shorter than timeslice). Pre-emption immunity gives it one and only one additional timeslice.

but the yield_to() does almost the same thing, there is a small bump, but you don't have to wait for thread B to spin, thread C..ZZZ etc to spin before thread A can finish it's work. As soon as the second thread hits the critical section, thread A is going to be able to do more work (and hopefully finish)

Hope this helps clear things up.

It doesn't sound like you and I are understanding how the yield_to() approach would work. I hope my comments have helped get us on the same page.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/