Re: [PATCH] ipc/sem: Fix race between to-be-woken task and waker

From: Peter Zijlstra
Date: Thu Sep 26 2019 - 05:34:24 EST


On Fri, Sep 20, 2019 at 11:54:02AM -0400, Waiman Long wrote:
> While looking at a customr bug report about potential missed wakeup in
> the system V semaphore code, I spot a potential problem. The fact that
> semaphore waiter stays in TASK_RUNNING state while checking queue status
> may lead to missed wakeup if a spurious wakeup happens in the right
> moment as try_to_wake_up() will do nothing if the task state isn't right.
>
> To eliminate this possibility, the task state is now reset to
> TASK_INTERRUPTIBLE immediately after wakeup before checking the queue
> status. This should eliminate the race condition on the interaction
> between the queue status and the task state and fix the potential missed
> wakeup problem.

Bah, this code always makes my head hurt.

Yes, AFAICT the pattern it uses has been broken since 0a2b9d4c7967,
since that removed doing the actual wakeup from under the sem_lock(),
which is what it relies on.

> Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
> ---
> ipc/sem.c | 21 ++++++++++++++++++++-
> 1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/ipc/sem.c b/ipc/sem.c
> index 7da4504bcc7c..1bcd424be047 100644
> --- a/ipc/sem.c
> +++ b/ipc/sem.c
> @@ -2146,11 +2146,11 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
> sma->complex_count++;
> }
>
> + __set_current_state(TASK_INTERRUPTIBLE);

This can use a comment; this relies on the fact that this is in the
same sem_lock() section that added us to the queue.

> do {
> WRITE_ONCE(queue.status, -EINTR);
> queue.sleeper = current;
>
> - __set_current_state(TASK_INTERRUPTIBLE);
> sem_unlock(sma, locknum);
> rcu_read_unlock();
>
> @@ -2159,6 +2159,24 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
> else
> schedule();
>
> + /*
> + * A spurious wakeup at the right moment can cause race
> + * between the to-be-woken task and the waker leading to
> + * missed wakeup. Setting state back to TASK_INTERRUPTIBLE
> + * before checking queue.status will ensure that the race
> + * won't happen.
> + *
> + * CPU0 CPU1
> + *
> + * <spurious wakeup> wake_up_sem_queue_prepare():
> + * state = TASK_INTERRUPTIBLE status = error
> + * try_to_wake_up():
> + * smp_mb() smp_mb()
> + * if (status == -EINTR) if (!(p->state & state))
> + * schedule() goto out
> + */
> + set_current_state(TASK_INTERRUPTIBLE);
> +
> /*
> * fastpath: the semop has completed, either successfully or
> * not, from the syscall pov, is quite irrelevant to us at this
> @@ -2210,6 +2228,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
> sem_unlock(sma, locknum);
> rcu_read_unlock();
> out_free:
> + __set_current_state(TASK_RUNNING);
> if (sops != fast_sops)
> kvfree(sops);
> return error;
> --
> 2.18.1
>