On Fri, Jun 19, 2015 at 11:50:02AM -0400, Waiman Long wrote:
The current cmpxchg() loop in setting the _QW_WAITING flag for writersThis one does not in fact apply, seeing how I applied a previous
in queue_write_lock_slowpath() will contend with incoming readers
causing possibly extra cmpxchg() operations that are wasteful. This
patch changes the code to do a byte cmpxchg() to eliminate contention
with new readers.
A multithreaded microbenchmark running 5M read_lock/write_lock loop
on a 8-socket 80-core Westmere-EX machine running 4.0 based kernel
with the qspinlock patch have the following execution times (in ms)
with and without the patch:
With R:W ratio = 5:1
Threads w/o patch with patch % change
------- --------- ---------- --------
2 990 895 -9.6%
3 2136 1912 -10.5%
4 3166 2830 -10.6%
5 3953 3629 -8.2%
6 4628 4405 -4.8%
7 5344 5197 -2.8%
8 6065 6004 -1.0%
9 6826 6811 -0.2%
10 7599 7599 0.0%
15 9757 9766 +0.1%
20 13767 13817 +0.4%
With small number of contending threads, this patch can improve
locking performance by up to 10%. With more contending threads,
however, the gain diminishes.
With the extended qrwlock structure defined in asm-generic/qrwlock,
the queue_write_unlock() function is also simplified to a
smp_store_release() call.
Signed-off-by: Waiman Long<Waiman.Long@xxxxxx>
version.
Please send an incremental patch if you still want to change things to
this form.