Re: [PATCH] ipc/sem: Fix race between to-be-woken task and waker

From: Manfred Spraul
Date: Fri Sep 27 2019 - 00:59:09 EST


Hi,
On 9/26/19 8:12 PM, Waiman Long wrote:
On 9/26/19 5:34 AM, Peter Zijlstra wrote:
On Fri, Sep 20, 2019 at 11:54:02AM -0400, Waiman Long wrote:
While looking at a customr bug report about potential missed wakeup in
the system V semaphore code, I spot a potential problem. The fact that
semaphore waiter stays in TASK_RUNNING state while checking queue status
may lead to missed wakeup if a spurious wakeup happens in the right
moment as try_to_wake_up() will do nothing if the task state isn't right.

To eliminate this possibility, the task state is now reset to
TASK_INTERRUPTIBLE immediately after wakeup before checking the queue
status. This should eliminate the race condition on the interaction
between the queue status and the task state and fix the potential missed
wakeup problem.
You are obviously right, there is a huge race condition.
Bah, this code always makes my head hurt.

Yes, AFAICT the pattern it uses has been broken since 0a2b9d4c7967,
since that removed doing the actual wakeup from under the sem_lock(),
which is what it relies on.

Correct - I've overlooked that.

First, theory:

setting queue->status, reading queue->status, setting current->state=TASK_INTERRUPTIBLE are all under the correct spinlock.

(there is an opportunistic read of queue->status without locks, but it is retried when the lock got acquired)

setting current->state=RUNNING is outside of any lock.

So as far as current->state is concerned, the lock doesn't exist. And if the lock doesn't exist, we must follow the rules applicable for set_current_state().

I'll try to check the code this week.

And we should check the remaining wake-queue users, the logic is everywhere identical.

After having a second look at the code again, I probably misread the
code the first time around. In the sleeping path, there is a check of
queue.status and setting of task state both under the sem lock in the
sleeping path. So as long as setting of queue status is under lock, they
should synchronize properly.

It looks like queue status setting is under lock, but I can't use
lockdep to confirm that as the locking can be done by either the array
lock or in one of the spinlocks in the array. Are you aware of a way of
doing that?

For testing? Have you considered just always using the global lock?

(untested):

--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -370,7 +370,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
ÂÂÂÂÂÂÂ struct sem *sem;
ÂÂÂÂÂÂÂ int idx;

-ÂÂÂÂÂÂ if (nsops != 1) {
+ÂÂÂÂÂÂ if (nsops != 1 || 1) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /* Complex operation - acquire a full lock */
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ipc_lock_object(&sma->sem_perm);


Anyway, I do think we need to add some comment to clarify the situation
to avoid future confusion.

Around line 190 is the comment that explains locking & memory ordering.

I have only documented the content of sem_undo and sem_array, but neither queue nor current->state :-(


--

ÂÂÂ Manfred