From: Joel Fernandes
Date: Sat Jun 08 2019 - 20:29:24 EST

On Fri, May 31, 2019 at 10:43 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > Either way, it would be good for you to just try it. Create a kernel
> > module or similar than hammers on percpu_down_read() and percpu_up_read(),
> > and empirically check the scalability on a largish system. Then compare
> > this to down_read() and up_read()
> Will do! thanks.

I created a test for this and the results are quite amazing just
stressed read lock/unlock for rwsem vs percpu-rwsem.
The test is conducted on a dual socket Intel x86_64 machine with 14
cores each socket.

Test runs 10,000,000 loops of rwsem vs percpu-rwsem:

Graphs/Results here:

The completion time of the test goes up somewhat exponentially with
the number of threads, for the rwsem case, where as for percpu-rwsem
it is the same. I could add this data to some of the documentation as


- Joel