Re: [RFC] [PATCH] Pre-emption control for userspace

From: Khalid Aziz
Date: Mon Mar 03 2014 - 18:30:51 EST

Next message: Linus Torvalds: "Re: Update of file offset on write() etc. is non-atomic with I/O"
Previous message: Linus Torvalds: "Re: [PATCH 0/1] mm, shmem: map few pages around fault address if they are in page cache"
In reply to: Davidlohr Bueso: "Re: [RFC] [PATCH] Pre-emption control for userspace"
Next in thread: Oleg Nesterov: "Re: [RFC] [PATCH] Pre-emption control for userspace"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 03/03/2014 02:51 PM, Davidlohr Bueso wrote:

On Mon, 2014-03-03 at 11:07 -0700, Khalid Aziz wrote:

I am working on a feature that has been requested by database folks that
helps with performance. Some of the oft executed database code uses
mutexes to lock other threads out of a critical section. They often see
a situation where a thread grabs the mutex, runs out of its timeslice
and gets switched out which then causes another thread to run which
tries to grab the same mutex, spins for a while and finally gives up.

This strikes me more of a feature for a real-time kernel. It is
definitely an interesting concept but wonder about it being abused.
Also, what about just using a voluntary preemption model instead? I'd
think that systems where this is really a problem would opt for that.

That was my first thought as well when I was asked to implement this feature :) Designing a system as a real-time system indeed gives the designer good control over pre-emption but the database folks do not really want or need a full real-time system espcially since they may have to run on the same server as other database related services. JVM certainly can not expect to be run as a realtime process. Database folks are perfectly happy running with CFS scheduler all the time except during this kind of critical section. This approach gives them some control to get extra timeslice when they need it. As for the abuse, it is no different from a realtime process that can lock up a processor much worse than this approach. As is the case when using realtime schedulers, one must use the tools wisely. I have thought about allowing sysadmins to lock this functionality down some but that does add more complexity. I am open to doing that if most people feel it is necessary.

This can happen with multiple threads until original lock owner gets the
CPU again and can complete executing its critical section. This queueing
and subsequent CPU cycle wastage can be avoided if the locking thread
could request to be granted an additional timeslice if its current
timeslice runs out before it gives up the lock. Other operating systems
have implemented this functionality and is used by databases as well as
JVM. This functionality has been shown to improve performance by 3%-5%.

Could you elaborate more on those performance numbers? What
benchmark/workload?

Thanks,
Davidlohr

This was with tpc-c.

Thanks,
Khalid

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Linus Torvalds: "Re: Update of file offset on write() etc. is non-atomic with I/O"
Previous message: Linus Torvalds: "Re: [PATCH 0/1] mm, shmem: map few pages around fault address if they are in page cache"
In reply to: Davidlohr Bueso: "Re: [RFC] [PATCH] Pre-emption control for userspace"
Next in thread: Oleg Nesterov: "Re: [RFC] [PATCH] Pre-emption control for userspace"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]