On Sat, Mar 19, 2016 at 11:21:19PM -0400, Waiman Long wrote:
In qrwlock, the reader that is spining on the lock will need to notifyHow did you find this 'problem'?
the next reader in the queue when the lock is free. That introduces a
reader-to-reader latency that is not present in the original rwlock.
That is the price for reducing lock cacheline contention. It alsoSo the embedded people might not like that much.
reduces the performance benefit of qrwlock on reader heavy workloads.
However, if we allow a limited number of readers to spin on the
lock simultaneously, we can eliminates some of the reader-to-reader
latencies at the expense of a bit more cacheline contention and
probably more power consumption.
This patch changes the reader slowpath to allow multiple readers toDo you have an actual real world benchmark where this makes a
spin on the lock. The maximum number of concurrent readers allowed
is currently set to 4 to limit the amount of additional cacheline
contention while improving reader performance on most workloads. If
a writer comes to the queue head, however, it will stop additional
readers from coming out.
Using a multi-threaded locking microbenchmark on a 4-socket 40-core
Haswell-EX system, the locking throughput of 4.5-rc6 kernel with or
without the patch were as follows:
difference?
/**Only 1 more can come forward with this logic. How can you ever get to 4?
* queued_read_lock_slowpath - acquire read lock of a queue rwlock
* @lock: Pointer to queue rwlock structure
* @cnts: Current qrwlock lock value
*/
void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
{
+ bool locked = true;
+
/*
* Readers come here when they cannot get the lock without waiting
*/
@@ -78,7 +71,10 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
* semantics) until the lock is available without waiting in
* the queue.
*/
+ while ((cnts& _QW_WMASK) == _QW_LOCKED) {
+ cpu_relax_lowlatency();
+ cnts = atomic_read_acquire(&lock->cnts);
+ }
return;
}
atomic_sub(_QR_BIAS,&lock->cnts);
@@ -92,14 +88,31 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
* The ACQUIRE semantics of the following spinning code ensure
* that accesses can't leak upwards out of our subsequent critical
* section in the case that the lock is currently held for write.
+ *
+ * The reader increments the reader count& wait until the writer
+ * releases the lock.
*/
cnts = atomic_add_return_acquire(_QR_BIAS,&lock->cnts) - _QR_BIAS;
+ while ((cnts& _QW_WMASK) == _QW_LOCKED) {
+ if (locked&& ((cnts>> _QR_SHIFT)< MAX_SPINNING_READERS)) {
+ /*
+ * Unlock the wait queue so that more readers can
+ * come forward and waiting for the writer to exit
+ * as long as no more than MAX_SPINNING_READERS
+ * readers are present.
+ */
+ arch_spin_unlock(&lock->wait_lock);
+ locked = false;
Also, what says the next in queue is a reader?