Re: [PATCH v5] locking/rwsem: Make handoff bit handling more consistent

From: Waiman Long
Date: Tue Nov 16 2021 - 09:52:16 EST



On 11/16/21 04:24, Peter Zijlstra wrote:
On Tue, Nov 16, 2021 at 10:14:20AM +0100, Peter Zijlstra wrote:
On Tue, Nov 16, 2021 at 10:52:42AM +0800, Aiqun(Maria) Yu wrote:
On 11/16/2021 9:29 AM, Waiman Long wrote:
There are some inconsistency in the way that the handoff bit is being
handled in readers and writers.

Firstly, when a queue head writer set the handoff bit, it will clear it
when the writer is being killed or interrupted on its way out without
acquiring the lock. That is not the case for a queue head reader. The
handoff bit will simply be inherited by the next waiter.

Secondly, in the out_nolock path of rwsem_down_read_slowpath(), both
the waiter and handoff bits are cleared if the wait queue becomes empty.
For rwsem_down_write_slowpath(), however, the handoff bit is not checked
and cleared if the wait queue is empty. This can potentially make the
handoff bit set with empty wait queue.

To make the handoff bit handling more consistent and robust, extract
out handoff bit clearing code into the new rwsem_del_waiter() helper
function. The common function will only use atomic_long_andnot() to
clear bits when the wait queue is empty to avoid possible race condition.
we do have race condition needed to be fixed with this change.
Indeed, let me edit the changelog to reflect that. Also, I think, it
needs a Reported-by:.
How's something liks so then?

---
Subject: locking/rwsem: Make handoff bit handling more consistent
From: Waiman Long <longman@xxxxxxxxxx>
Date: Mon, 15 Nov 2021 20:29:12 -0500

From: Waiman Long <longman@xxxxxxxxxx>

There are some inconsistency in the way that the handoff bit is being
handled in readers and writers that lead to a race condition.

Firstly, when a queue head writer set the handoff bit, it will clear
it when the writer is being killed or interrupted on its way out
without acquiring the lock. That is not the case for a queue head
reader. The handoff bit will simply be inherited by the next waiter.

Secondly, in the out_nolock path of rwsem_down_read_slowpath(), both
the waiter and handoff bits are cleared if the wait queue becomes
empty. For rwsem_down_write_slowpath(), however, the handoff bit is
not checked and cleared if the wait queue is empty. This can
potentially make the handoff bit set with empty wait queue.

Worse, the situation in rwsem_down_write_slowpath() relies on wstate,
a variable set outside of the critical section containing the ->count
manipulation, this leads to race condition where RWSEM_FLAG_HANDOFF
can be double subtracted, corrupting ->count.

To make the handoff bit handling more consistent and robust, extract
out handoff bit clearing code into the new rwsem_del_waiter() helper
function. Also, completely eradicate wstate; always evaluate
everything inside the same critical section.

The common function will only use atomic_long_andnot() to clear bits
when the wait queue is empty to avoid possible race condition. If the
first waiter with handoff bit set is killed or interrupted to exit the
slowpath without acquiring the lock, the next waiter will inherit the
handoff bit.

While at it, simplify the trylock for loop in
rwsem_down_write_slowpath() to make it easier to read.

Fixes: 4f23dbc1e657 ("locking/rwsem: Implement lock handoff to prevent lock starvation")
Reported-by: Zhenhua Ma <mazhenhua@xxxxxxxxxx>
Suggested-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Link: https://lkml.kernel.org/r/20211116012912.723980-1-longman@xxxxxxxxxx
---

Yes, that looks good to me. Thanks for the editing.

Cheers,
Longman