Re: [PATCH] locking/rwsem: use read_acquire in read_slowpath exit when queue is empty

From: Waiman Long
Date: Tue Jul 16 2019 - 15:09:54 EST


On 7/16/19 2:58 PM, Peter Zijlstra wrote:
> On Tue, Jul 16, 2019 at 12:53:14PM -0400, Waiman Long wrote:
>> On 7/16/19 12:04 PM, Jan Stancek wrote:
> Fixes: 4b486b535c33 ("locking/rwsem: Exit read lock slowpath if queue empty & no writer")
> Signed-off-by: Jan Stancek <jstancek@xxxxxxxxxx>
> Cc: Waiman Long <longman@xxxxxxxxxx>
> Cc: Davidlohr Bueso <dbueso@xxxxxxx>
> Cc: Will Deacon <will@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> ---
> kernel/locking/rwsem.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 37524a47f002..757b198d7a5b 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -1030,7 +1030,7 @@ static inline bool rwsem_reader_phase_trylock(struct rw_semaphore *sem,
> * exit the slowpath and return immediately as its
> * RWSEM_READER_BIAS has already been set in the count.
> */
> - if (adjustment && !(atomic_long_read(&sem->count) &
> + if (adjustment && !(atomic_long_read_acquire(&sem->count) &
> (RWSEM_WRITER_MASK | RWSEM_FLAG_HANDOFF))) {
> raw_spin_unlock_irq(&sem->wait_lock);
> rwsem_set_reader_owned(sem);
>> The chance of taking this path is not that high. So instead of
>> increasing the cost of the test by adding an acquire barrier, how about
>> just adding smp_mb__after_spinlock() before spin_unlock_irq(). This
>> should have the same effect of making sure that no stale data will be
>> used in the read-lock critical section.
> That's actually more expensive on something like ARM64 I expect.
>
> The far cheaper alternative is smp_acquire__after_ctrl_dep(), however in
> general Will seems to prefer using load-acquire over separate barriers,
> and for x86 it doesn't matter anyway. For PowerPC these two are a wash,
> both end up with LWSYNC (over SYNC for your alternative).

With lock event counting turned on, my experience with this code path
was that it got hit very infrequently. It is even less frequent with the
latest reader optimistic spinning patch. That is why I prefer making it
a bit more costly when the condition is true without incurring a cost at
all when the condition is false which is the majority of the cases.
Anyway, this additional cost is for arm64 only, but it is still more
than compensated by all skipping all the waiting list manipulation and
waking up itself code.

Cheers,
Longman