[PATCH AUTOSEL 4.19 31/65] sched/wake_q: Fix wakeup ordering for wake_q

From: Sasha Levin
Date: Thu Feb 14 2019 - 21:32:49 EST


From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>

[ Upstream commit 4c4e3731564c8945ac5ac90fc2a1e1f21cb79c92 ]

Notable cmpxchg() does not provide ordering when it fails, however
wake_q_add() requires ordering in this specific case too. Without this
it would be possible for the concurrent wakeup to not observe our
prior state.

Andrea Parri provided:

C wake_up_q-wake_q_add

{
int next = 0;
int y = 0;
}

P0(int *next, int *y)
{
int r0;

/* in wake_up_q() */

WRITE_ONCE(*next, 1); /* node->next = NULL */
smp_mb(); /* implied by wake_up_process() */
r0 = READ_ONCE(*y);
}

P1(int *next, int *y)
{
int r1;

/* in wake_q_add() */

WRITE_ONCE(*y, 1); /* wake_cond = true */
smp_mb__before_atomic();
r1 = cmpxchg_relaxed(next, 1, 2);
}

exists (0:r0=0 /\ 1:r1=0)

This "exists" clause cannot be satisfied according to the LKMM:

Test wake_up_q-wake_q_add Allowed
States 3
0:r0=0; 1:r1=1;
0:r0=1; 1:r1=0;
0:r0=1; 1:r1=1;
No
Witnesses
Positive: 0 Negative: 3
Condition exists (0:r0=0 /\ 1:r1=0)
Observation wake_up_q-wake_q_add Never 0 3

Reported-by: Yongji Xie <elohimes@xxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Cc: Davidlohr Bueso <dave@xxxxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Waiman Long <longman@xxxxxxxxxx>
Cc: Will Deacon <will.deacon@xxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---
kernel/sched/core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 13ddfa46d741..152a0b0c91bb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -405,10 +405,11 @@ void wake_q_add(struct wake_q_head *head, struct task_struct *task)
* its already queued (either by us or someone else) and will get the
* wakeup due to that.
*
- * This cmpxchg() executes a full barrier, which pairs with the full
- * barrier executed by the wakeup in wake_up_q().
+ * In order to ensure that a pending wakeup will observe our pending
+ * state, even in the failed case, an explicit smp_mb() must be used.
*/
- if (cmpxchg(&node->next, NULL, WAKE_Q_TAIL))
+ smp_mb__before_atomic();
+ if (cmpxchg_relaxed(&node->next, NULL, WAKE_Q_TAIL))
return;

get_task_struct(task);
--
2.19.1