Re: [BUG]locking/rwsem: only clean RWSEM_FLAG_HANDOFF when already set
From: Waiman Long
Date: Sun Nov 07 2021 - 10:24:48 EST
On 11/7/21 04:01, Hillf Danton wrote:
On Sat, 6 Nov 2021 23:25:38 -0400 Waiman Long wrote:
On 11/6/21 08:39, 马振华 wrote:
Dear longman,
recently , i find a issue which rwsem count is negative value, it
happened always when a task try to get the lock
with __down_write_killable , then it is killed
this issue happened like this
CPU2 CPU4
task A[reader] task B[writer]
down_read_killable[locked]
sem->count=0x100
down_write_killable
sem->count=0x102[wlist not empty]
up_read
count=0x2
sig kill received
down_read_killable
sem->count=0x102[wlist not empty]
goto branch out_nolock:
list_del(&waiter.list);
wait list is empty
sem->count-RWSEM_FLAG_HANDOFF
sem->count=0xFE
list_empty(&sem->wait_list) is TRUE
sem->count andnot RWSEM_FLAG_WAITERS
sem->count=0xFC
up_read
sem->count -= 0x100
sem->count=0xFFFFFFFFFFFFFFFC
DEBUG_RWSEMS_WARN_ON(tmp < 0, sem);
so sem->count will be negative after writer is killed
i think if flag RWSEM_FLAG_HANDOFF is not set, we shouldn't clean it
Thanks for reporting this possible race condition.
However, I am still trying to figure how it is possible to set the
wstate to WRITER_HANDOFF without actually setting the handoff bit as
well. The statement sequence should be as follows:
wstate = WRITER_HANDOFF;
raw_spin_lock_irq(&sem->wait_lock);
if (rwsem_try_write_lock(sem, wstate))
raw_spin_unlock_irq(&sem->wait_lock);
:
if (signal_pending_state(state, current))
goto out_nolock
The rwsem_try_write_lock() function will make sure that we either
acquire the lock and clear handoff or set the handoff bit. This should
be done before we actually check for signal. I do think that it is
Given that WRITER_HANDOFF makes no sure that RWSEM_FLAG_HANDOFF is set in
wsem_try_write_lock(), the flag should be cleared only by the setter to
avoid count underflow.
Hillf
probably safer to use atomic_long_andnot to clear the handoff bit
instead of using atomic_long_add().
I did have a tentative patch to address this issue which is somewhat
similar to your approach. However, I would like to further investigate
the exact mechanics of the race condition to make sure that I won't miss
a latent bug somewhere else in the rwsem code.
-Longman
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index c51387a43265..20be967620c0 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -347,7 +347,8 @@ enum rwsem_wake_type {
enum writer_wait_state {
WRITER_NOT_FIRST, /* Writer is not first in wait list */
WRITER_FIRST, /* Writer is first in wait list */
- WRITER_HANDOFF /* Writer is first & handoff needed */
+ WRITER_NEED_HANDOFF /* Writer is first & handoff needed */
+ WRITER_HANDOFF /* Writer is first & handoff set */
};
/*
@@ -532,11 +533,11 @@ static void rwsem_mark_wake(struct rw_semaphore *sem,
* race conditions between checking the rwsem wait list and setting the
* sem->count accordingly.
*
- * If wstate is WRITER_HANDOFF, it will make sure that either the handoff
+ * If wstate is WRITER_NEED_HANDOFF, it will make sure that either the
handoff
* bit is set or the lock is acquired with handoff bit cleared.
*/
static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
- enum writer_wait_state wstate)
+ enum writer_wait_state *wstate)
{
long count, new;
@@ -546,13 +547,13 @@ static inline bool rwsem_try_write_lock(struct
rw_semaphore *sem,
do {
bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);
- if (has_handoff && wstate == WRITER_NOT_FIRST)
+ if (has_handoff && *wstate == WRITER_NOT_FIRST)
return false;
new = count;
if (count & RWSEM_LOCK_MASK) {
- if (has_handoff || (wstate != WRITER_HANDOFF))
+ if (has_handoff || (*wstate != WRITER_NEED_HANDOFF))
return false;
new |= RWSEM_FLAG_HANDOFF;
@@ -569,8 +570,10 @@ static inline bool rwsem_try_write_lock(struct
rw_semaphore *sem,
* We have either acquired the lock with handoff bit cleared or
* set the handoff bit.
*/
- if (new & RWSEM_FLAG_HANDOFF)
+ if (new & RWSEM_FLAG_HANDOFF) {
+ *wstate = WRITER_HANDOFF;
return false;
+ }
rwsem_set_owner(sem);
return true;
@@ -1083,7 +1086,7 @@ rwsem_down_write_slowpath(struct rw_semaphore
*sem, int state)
/* wait until we successfully acquire the lock */
set_current_state(state);
for (;;) {
- if (rwsem_try_write_lock(sem, wstate)) {
+ if (rwsem_try_write_lock(sem, &wstate)) {
/* rwsem_try_write_lock() implies ACQUIRE on success */
break;
}
@@ -1138,7 +1141,7 @@ rwsem_down_write_slowpath(struct rw_semaphore
*sem, int state)
*/
if ((wstate == WRITER_FIRST) && (rt_task(current) ||
time_after(jiffies, waiter.timeout))) {
- wstate = WRITER_HANDOFF;
+ wstate = WRITER_NEED_HANDOFF;
lockevent_inc(rwsem_wlock_handoff);
break;
}
@@ -1159,7 +1162,7 @@ rwsem_down_write_slowpath(struct rw_semaphore
*sem, int state)
list_del(&waiter.list);
if (unlikely(wstate == WRITER_HANDOFF))
- atomic_long_add(-RWSEM_FLAG_HANDOFF, &sem->count);
+ atomic_long_addnot(RWSEM_FLAG_HANDOFF, &sem->count);
if (list_empty(&sem->wait_list))
atomic_long_andnot(RWSEM_FLAG_WAITERS, &sem->count);
--