Re: [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin

From: Waiman Long
Date: Mon Aug 04 2014 - 14:07:58 EST


On 08/04/2014 12:25 AM, Davidlohr Bueso wrote:
On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
This patch set improves upon the rwsem optimistic spinning patch set
from Davidlohr to enable better performing rwsem and more aggressive
use of optimistic spinning.

By using a microbenchmark running 1 million lock-unlock operations per
thread on a 4-socket 40-core Westmere-EX x86-64 test machine running
3.16-rc7 based kernels, the following table shows the execution times
with 2/10 threads running on different CPUs on the same socket where
load is the number of pause instructions in the critical section:

lock/r:w ratio # of threads Load:Execution Time (ms)
-------------- ------------ ------------------------
mutex 2 1:530.7, 5:406.0, 10:472.7
mutex 10 1:1848 , 5:2046 , 10:4394

Before patch:
rwsem/0:1 2 1:339.4, 5:368.9, 10:394.0
rwsem/1:1 2 1:2915 , 5:2621 , 10:2764
rwsem/10:1 2 1:891.2, 5:779.2, 10:827.2
rwsem/0:1 10 1:5618 , 5:5722 , 10:5683
rwsem/1:1 10 1:14562, 5:14561, 10:14770
rwsem/10:1 10 1:5914 , 5:5971 , 10:5912

After patch:
rwsem/0:1 2 1:161.1, 5:244.4, 10:271.4
rwsem/1:1 2 1:188.8, 5:212.4, 10:312.9
rwsem/10:1 2 1:168.8, 5:179.5, 10:209.8
rwsem/0:1 10 1:1306 , 5:1733 , 10:1998
rwsem/1:1 10 1:1512 , 5:1602 , 10:2093
rwsem/10:1 10 1:1267 , 5:1458 , 10:2233

% Change:
rwsem/0:1 2 1:-52.5%, 5:-33.7%, 10:-31.1%
rwsem/1:1 2 1:-93.5%, 5:-91.9%, 10:-88.7%
rwsem/10:1 2 1:-81.1%, 5:-77.0%, 10:-74.6%
rwsem/0:1 10 1:-76.8%, 5:-69.7%, 10:-64.8%
rwsem/1:1 10 1:-89.6%, 5:-89.0%, 10:-85.8%
rwsem/10:1 10 1:-78.6%, 5:-75.6%, 10:-62.2%
So at a very low level you see nicer results, which aren't really
translating to much of a significant impact at a higher level (aim7).

I was using a 4-socket system for testing. I believe the performance gain will be higher on larger machine. I will run some tests on those larger machine as well.
It can be seen that there is dramatic reduction in the execution
times. The new rwsem is now even faster than mutex whether it is all
writers or a mixture of writers and readers.

Running the AIM7 benchmarks on the same 40-core system (HT off),
the performance improvements on some of the workloads were as follows:

Workload Before Patch After Patch % Change
-------- ------------ ----------- --------
custom (200-1000) 446135 477404 +7.0%
custom (1100-2000) 449665 484734 +7.8%
high_systime 152437 154217 +1.2%
(200-1000)
high_systime 269695 278942 +3.4%
(1100-2000)
I worry about complicating rwsems even _more_ than they are, specially
for such a marginal gain. You might want to try other workloads -- ie:
postgresql (pgbench), I normally get pretty useful data when dealing
with rwsems.


Thank for the info. I will try running pgbench as well.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/