Re: [RFC][PATCH RT] rwsem_rt: Another (more sane) approach to mulitreader rt locks

From: John Kacur
Date: Tue May 15 2012 - 14:01:02 EST


This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.


On Tue, 15 May 2012, Steven Rostedt wrote:

> The RT patch has been having lots of trouble lately with large machines
> and applications running lots of threads. This usually boils down to a
> bottle neck of a single lock: the mm->mmap_sem.
>
> The mmap_sem is a rwsem, which can sleep, but it also can be taken with
> a read/write lock, where a read lock can be taken by several tasks at
> the same time and the write lock can be only taken by a single task.
>
> But due to priority inheritance, having multiple readers makes the code
> much more complex, thus the -rt patch converts all rwsems into a single
> mutex, where readers may nest (the same task may grab the same rwsem for
> read multiple times), but only one task may hold the rwsem at any given
> time (for read or write).
>
> When we have lots of threads, the rwsem may be taken often, either for
> memory allocation or filling in page faults. This becomes a bottle neck
> for threads as only one thread at a time may grab the mmap_sem (which is
> shared by all threads of a process).
>
> Previous attempts of adding multiple readers became too complex and was
> error prone. This approach takes on a much more simpler technique, one
> that is actually used by per cpu locks.
>
> The idea here is to have an rwsem create a rt_mutex for each CPU.
> Actually, it creates a rwsem for each CPU that can only be acquired by
> one task at a time. This allows for readers on separate CPUs to take
> only the per cpu lock. When a writer needs to take a lock, it must grab
> all CPU locks before continuing.
>
> This approach does nothing special with the rt_mutex or priority
> inheritance code. That stays the same, and works normally (thus less
> error prone). The trick here is that when a reader takes a rwsem for
> read, it must disable migration, that way it can unlock the rwsem
> without needing any special searches (which lock did it take?).
>
> I've tested this a bit, and so far it works well. I haven't found a nice
> way to initialize the locks, so I'm using the silly initialize_rwsem()
> at all places that acquire the lock. But we can work on this later.
>
> Also, I don't use per_cpu sections for the locks, which means we have
> cache line collisions, but a normal (mainline) rwsem has that as well.
>
> These are all room for improvement (and why this is just an RFC patch).
>
> I'll see if I can get some numbers to see how this fixes the issues with
> multi threads on big boxes.
>
> Thoughts?
>
> -- Steve
>
> Not-yet-signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>

It looks interesting. I wanted to compile it and test it, but started
running into some problems, I fixed two simple things, but wanted to wait
to see if you would follow Peter's suggestion for lockdep before
proceeding too far.

Thanks
John