[PATCH] ipc/sem: Fix race between to-be-woken task and waker

From: Waiman Long
Date: Fri Sep 20 2019 - 11:54:25 EST


While looking at a customr bug report about potential missed wakeup in
the system V semaphore code, I spot a potential problem. The fact that
semaphore waiter stays in TASK_RUNNING state while checking queue status
may lead to missed wakeup if a spurious wakeup happens in the right
moment as try_to_wake_up() will do nothing if the task state isn't right.

To eliminate this possibility, the task state is now reset to
TASK_INTERRUPTIBLE immediately after wakeup before checking the queue
status. This should eliminate the race condition on the interaction
between the queue status and the task state and fix the potential missed
wakeup problem.

Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
ipc/sem.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index 7da4504bcc7c..1bcd424be047 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -2146,11 +2146,11 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
sma->complex_count++;
}

+ __set_current_state(TASK_INTERRUPTIBLE);
do {
WRITE_ONCE(queue.status, -EINTR);
queue.sleeper = current;

- __set_current_state(TASK_INTERRUPTIBLE);
sem_unlock(sma, locknum);
rcu_read_unlock();

@@ -2159,6 +2159,24 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
else
schedule();

+ /*
+ * A spurious wakeup at the right moment can cause race
+ * between the to-be-woken task and the waker leading to
+ * missed wakeup. Setting state back to TASK_INTERRUPTIBLE
+ * before checking queue.status will ensure that the race
+ * won't happen.
+ *
+ * CPU0 CPU1
+ *
+ * <spurious wakeup> wake_up_sem_queue_prepare():
+ * state = TASK_INTERRUPTIBLE status = error
+ * try_to_wake_up():
+ * smp_mb() smp_mb()
+ * if (status == -EINTR) if (!(p->state & state))
+ * schedule() goto out
+ */
+ set_current_state(TASK_INTERRUPTIBLE);
+
/*
* fastpath: the semop has completed, either successfully or
* not, from the syscall pov, is quite irrelevant to us at this
@@ -2210,6 +2228,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
sem_unlock(sma, locknum);
rcu_read_unlock();
out_free:
+ __set_current_state(TASK_RUNNING);
if (sops != fast_sops)
kvfree(sops);
return error;
--
2.18.1