Subject: lockin/rwbase: Take care of ordering guarantee for fastpath reader
From: Boqun Feng <boqun.feng@xxxxxxxxx>
Date: Wed, 1 Sep 2021 23:06:27 +0800
From: Boqun Feng <boqun.feng@xxxxxxxxx>
Readers of rwbase can lock and unlock without taking any inner lock, if
that happens, we need the ordering provided by atomic operations to
satisfy the ordering semantics of lock/unlock. Without that, considering
the follow case:
{ X = 0 initially }
CPU 0 CPU 1
===== =====
rt_write_lock();
X = 1
rt_write_unlock():
atomic_add(READER_BIAS - WRITER_BIAS, ->readers);
// ->readers is READER_BIAS.
rt_read_lock():
if ((r = atomic_read(->readers)) < 0) // True
atomic_try_cmpxchg(->readers, r, r + 1); // succeed.
<acquire the read lock via fast path>
r1 = X; // r1 may be 0, because nothing prevent the reordering
// of "X=1" and atomic_add() on CPU 1.
Therefore audit every usage of atomic operations that may happen in a
fast path, and add necessary barriers.
Signed-off-by: Boqun Feng <boqun.feng@xxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Link: https://lkml.kernel.org/r/20210901150627.620830-1-boqun.feng@xxxxxxxxx
---
kernel/locking/rwbase_rt.c | 41 ++++++++++++++++++++++++++++++++++++-----
1 file changed, 36 insertions(+), 5 deletions(-)
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -41,6 +41,12 @@
* The risk of writer starvation is there, but the pathological use cases
* which trigger it are not necessarily the typical RT workloads.
*
+ * Fast-path orderings:
+ * The lock/unlock of readers can run in fast paths: lock and unlock are only
+ * atomic ops, and there is no inner lock to provide ACQUIRE and RELEASE
+ * semantics of rwbase_rt. Atomic ops then should be stronger than _acquire()
+ * and _release() to provide necessary ordering guarantee.
@@ -210,14 +224,23 @@ static int __sched rwbase_write_lock(str^acquire
atomic_sub(READER_BIAS, &rwb->readers);
raw_spin_lock_irqsave(&rtm->wait_lock, flags);
+
+ /* The below set_*_state() thingy implies smp_mb() to provide ACQUIRE */
+ readers = atomic_read(&rwb->readers);
/*
* set_current_state() for rw_semaphore
* current_save_and_set_rtlock_wait_state() for rwlock
*/
rwbase_set_and_save_current_state(state);
- /* Block until all readers have left the critical section. */
- for (; atomic_read(&rwb->readers);) {
+ /*
+ * Block until all readers have left the critical section.
+ *
+ * _acqurie() is needed in case that the reader side runs in the fast