Re: [PATCH 2/2] [PATCH] sched: Add smp_rmb() in task rq locking cycles

From: Chris Metcalf
Date: Tue Apr 28 2015 - 11:53:40 EST

On 04/28/2015 10:33 AM, Peter Zijlstra wrote:
On Sun, Apr 26, 2015 at 03:52:13AM -0700, Paul E. McKenney wrote:

And then an smp_read_barrier_depends() would be needed either here
or embedded in apin_unlock_wait(). But we also need to check the
spin_unlock_wait() implementations to see if any are potentially
vulnerable to compiler misbehavior due to lack of ACCESS_ONCE(),
READ_ONCE(), or other sources of the required volatility:

o tile: For 32-bit, looks like a bug. Compares ->current_ticket and
->next_ticket with no obvious protection. The compiler is free to
load them in either order, so it is possible that the two fields
could compare equal despite never having actually been equal at
any given time. Needs something like arm, arm64, mips, or x86
to do single fetch, then compare fields in quantity fetched.

Except that this appears to be using int on a 32-bit system,
thus might not have a 64-bit load. If that is the case, the
trick would be to load them in order. Except that this can be
defeated by overflow. Are there really 32-bit tile systems with
enough CPUs to overflow an unsigned short?

As you surmise, tilepro doesn't have 64-bit loads. So we are stuck with
32-bit loads on these two fields. It's true that spin_unlock_wait() can
therefore falsely claim that the lock is unlocked, but it should be only a
hint anyway, since by the time the caller tries to act on that information
the lock may have been retaken anyway, right? If spin_unlock_wait() is
really trying to guarantee that the lock was available at some point in
the interval between when it was called and when it returned, we could use
READ_ONCE() to read the current ticket value first; is that a necessary
part of the semantics?

(Even with READ_ONCE we'd still be exposed to a technical risk that
others cores had taken and released the lock 2 billion times in between
the two loads of the core running spin_unlock_wait, without ever having
the lock actually be free, so technically the only solution is for that core
to actually acquire and release the lock, but that seems a bit extreme
in practice.)

The reason we use two 32-bit fields on tilepro is that the only available
atomic instruction is tns (test and set), which sets a 32-bit "1" value
into the target memory and returns the old 32-bit value. So we need to
be able to safely "corrupt" the next_ticket value with a "1", load the
current_ticket value, and if they don't match, rewrite next_ticket with
its old value. We can't safely do this if next_ticket and current_ticket
are 16-bit fields in one 32-bit word, since the "tns" operation would
corrupt the current_ticket value in that case, making someone
waiting on ticket 0 think they owned the lock.

On tilegx we made the atomic instruction situation much, much better :-)

For 64-bit, a READ_ONCE() appears to be in order -- no obvious
volatility present.

It depends, I guess. If you are spinning on arch_spin_is_locked(), yes,
you need to make sure to do something to ensure the value is re-read.
But arch_spin_unlock_wait() already calls delay_backoff(), which calls
relax(), which includes a barrier(), so we're OK there. But if stylistically
the consensus calls for a READ_ONCE() in arch_spin_is_locked(), I
can certainly add that. What do folks think?

Assuming the answers to both questions is "change the code",
how does this look?

diff --git a/arch/tile/include/asm/spinlock_32.h b/arch/tile/include/asm/spinlock_32.h
index c0a77b38d39a..7c7b80bd83db 100644
--- a/arch/tile/include/asm/spinlock_32.h
+++ b/arch/tile/include/asm/spinlock_32.h
@@ -41,8 +41,23 @@ static inline int arch_spin_is_locked(arch_spinlock_t *lock)
* to claim the lock is held, since it will be momentarily
* if not already. There's no need to wait for a "valid"
* lock->next_ticket to become available.
+ *
+ * We order the reads here so that if we claim the lock is
+ * unlocked, we know it actually was for at least a moment.
+ * Since current_ticket is never incremented above
+ * next_ticket, by reading current first, then next, and
+ * finding them equal, we know that during that window between
+ * the reads the lock was unlocked.
+ *
+ * There is a technical risk in this that between reading
+ * current and reading next, other cores locked and unlocked
+ * two billion times without the lock ever being unlocked, and
+ * therefore it looks like the lock was at some point unlocked
+ * but it never was. But this seems highly improbable.
- return lock->next_ticket != lock->current_ticket;
+ int current = READ_ONCE(lock->current_ticket);
+ int next = READ_ONCE(lock->next_ticket);
+ return next != current;
void arch_spin_lock(arch_spinlock_t *lock);
diff --git a/arch/tile/include/asm/spinlock_64.h b/arch/tile/include/asm/spinlock_64.h
index 9a12b9c7e5d3..b9718fb4e74a 100644
--- a/arch/tile/include/asm/spinlock_64.h
+++ b/arch/tile/include/asm/spinlock_64.h
@@ -18,6 +18,8 @@
+#include <linux/compiler.h>
/* Shifts and masks for the various fields in "lock". */
#define __ARCH_SPIN_NEXT_MASK 0x7fff
@@ -44,7 +46,8 @@ static inline u32 arch_spin_next(u32 val)
/* The lock is locked if a task would have to wait to get it. */
static inline int arch_spin_is_locked(arch_spinlock_t *lock)
- u32 val = lock->lock;
+ /* Use READ_ONCE() to ensure that calling this in a loop is OK. */
+ u32 val = READ_ONCE(lock->lock);
return arch_spin_current(val) != arch_spin_next(val);

Chris Metcalf, EZChip Semiconductor

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at