[RFC] locking/rwsem: Avoid issuing wakeup before setting the reader waiter to nil

From: Yongji Xie
Date: Thu Nov 29 2018 - 07:50:37 EST


From: Xie Yongji <xieyongji@xxxxxxxxx>

Our system encountered a problem recently, the khungtaskd detected
some process hang on mmap_sem. But the odd thing was that one task which
is not on mmap_sem.wait_list still sleeps in rwsem_down_read_failed().
Through code inspection, we found a potential bug can lead to this.

Imaging this:

Thread 1 Thread 2
down_write();
rwsem_down_read_failed()
raw_spin_lock_irq(&sem->wait_lock);
list_add_tail(&waiter.list, &wait_list);
raw_spin_unlock_irq(&sem->wait_lock);
__up_write();
rwsem_wake();
__rwsem_mark_wake();
wake_q_add();
list_del(&waiter->list);
waiter->task = NULL;
while (true) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (!waiter.task) // true
break;
}
__set_current_state(TASK_RUNNING);

Now Thread 1 is queued in Thread 2's wake_q without sleeping. Then
Thread 1 call rwsem_down_read_failed() again because Thread 3
hold the lock, if Thread 3 tries to queue Thread 1 before Thread 2
do wakeup, it will fail and miss wakeup:

Thread 1 Thread 2 Thread 3
down_write();
rwsem_down_read_failed()
raw_spin_lock_irq(&sem->wait_lock);
list_add_tail(&waiter.list, &wait_list);
raw_spin_unlock_irq(&sem->wait_lock);
__rwsem_mark_wake();
wake_q_add();
wake_up_q();
waiter->task = NULL;
while (true) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (!waiter.task) // false
break;
schedule();
}
wake_up_q(&wake_q);

In another word, that means we might issue the wakeup before setting the reader
waiter to nil. If so, the wakeup may do nothing when it was called before reader
set task state to TASK_UNINTERRUPTIBLE. Then we would have no chance to wake up
the reader any more, and cause other writers such as "ps" command stuck on it.

This patch is not verified because we still have no way to reproduce the problem.
But I'd like to ask for some comments from community firstly.

Signed-off-by: Xie Yongji <xieyongji@xxxxxxxxx>
Signed-off-by: Zhang Yu <zhangyu31@xxxxxxxxx>
---
kernel/locking/rwsem-xadd.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 09b1800..50d9af6 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -198,15 +198,22 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
woken++;
tsk = waiter->task;

- wake_q_add(wake_q, tsk);
+ get_task_struct(tsk);
list_del(&waiter->list);
/*
- * Ensure that the last operation is setting the reader
+ * Ensure calling get_task_struct() before setting the reader
* waiter to nil such that rwsem_down_read_failed() cannot
* race with do_exit() by always holding a reference count
* to the task to wakeup.
*/
smp_store_release(&waiter->task, NULL);
+ /*
+ * Ensure issuing the wakeup (either by us or someone else)
+ * after setting the reader waiter to nil.
+ */
+ wake_q_add(wake_q, tsk);
+ /* wake_q_add() already take the task ref */
+ put_task_struct(tsk);
}

adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
--
2.2.3