[PATCH v2 0/7] locking/rwsem: enable reader opt-spinning & writer respin

From: Waiman Long
Date: Thu Aug 07 2014 - 18:27:12 EST


v1->v2:
- Remove patch 1 which changes preempt_enable() to
preempt_enable_no_resched().
- Remove the RWSEM_READ_OWNED macro and assume readers own the lock
when owner is NULL.
- Reduce the spin threshold to 64.
- Enable writer respin only if spinners are present.

This patch set improves upon the rwsem optimistic spinning patch set
from Davidlohr to enable better performing rwsem and more aggressive
use of optimistic spinning.

By using a microbenchmark running 1 million lock-unlock operations per
thread on a 4-socket 40-core Westmere-EX x86-64 test machine running
3.16-rc7 based kernels, the following table shows the execution times
with 2/10 threads running on different CPUs on the same socket where
load is the number of pause instructions in the critical section:

lock/r:w ratio # of threads Load:Execution Time (ms)
-------------- ------------ ------------------------
mutex 2 1:530.7, 5:406.0, 10:472.7
mutex 10 1:1848 , 5:2046 , 10:4394

Before patch:
rwsem/0:1 2 1:339.4, 5:368.9, 10:394.0
rwsem/1:1 2 1:2915 , 5:2621 , 10:2764
rwsem/10:1 2 1:891.2, 5:779.2, 10:827.2
rwsem/0:1 10 1:5618 , 5:5722 , 10:5683
rwsem/1:1 10 1:14562, 5:14561, 10:14770
rwsem/10:1 10 1:5914 , 5:5971 , 10:5912

After patch:
rwsem/0:1 2 1:334.6, 5:334.5, 10:366.9
rwsem/1:1 2 1:311.0, 5:320.5, 10:300.0
rwsem/10:1 2 1:184.6, 5:180.6, 10:188.9
rwsem/0:1 10 1:1842 , 5:1925 , 10:2306
rwsem/1:1 10 1:1668 , 5:1706 , 10:1555
rwsem/10:1 10 1:1266 , 5:1294 , 10:1342

% Change:
rwsem/0:1 2 1: -1.4%, 5: -9.6%, 10: -6.7%
rwsem/1:1 2 1:-89.3%, 5:-87.7%, 10:-89.1%
rwsem/10:1 2 1:-79.3%, 5:-76.8%, 10:-77.2%
rwsem/0:1 10 1:-67.2%, 5:-66.4%, 10:-59.4%
rwsem/1:1 10 1:-88.5%, 5:-88.3%, 10:-89.5%
rwsem/10:1 10 1:-78.6%, 5:-78.3%, 10:-77.3%

It can be seen that there is dramatic reduction in the execution
times. The new rwsem is now even faster than mutex whether it is all
writers or a mixture of writers and readers.

Running the AIM7 benchmarks on a larger 8-socket 80-core system
(HT off), the performance improvements on some of the workloads were
as follows:

Workload Before Patch After Patch % Change
-------- ------------ ----------- --------
alltests (200-1000) 337892 345888 + 2.4%
alltests (1100-2000) 402535 474065 +17.8%
custom (200-1000) 480651 547522 +13.9%
custom (1100-2000) 461037 561588 +21.8%
shared (200-1000) 420845 458048 + 8.8%
shared (1100-2000) 428045 473121 +10.5%

Waiman Long (7):
locking/rwsem: check for active writer/spinner before wakeup
locking/rwsem: threshold limited spinning for active readers
locking/rwsem: rwsem_can_spin_on_owner can be called with preemption
enabled
locking/rwsem: more aggressive use of optimistic spinning
locking/rwsem: move down rwsem_down_read_failed function
locking/rwsem: enables optimistic spinning for readers
locking/rwsem: allow waiting writers to go back to spinning

include/linux/osq_lock.h | 5 +
kernel/locking/rwsem-xadd.c | 348 ++++++++++++++++++++++++++++++++++---------
2 files changed, 283 insertions(+), 70 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/