[PATCH 1/1] locking/qspinlock: Make the 1st spinner only spin on locked_pending bits
From: Qiuxu Zhuo
Date: Mon May 08 2023 - 04:16:11 EST
The 1st spinner (header of the MCS queue) spins on the whole qspinlock
variable to check whether the lock is released. For a contended qspinlock,
this spinning is a hotspot as each CPU queued in the MCS queue performs
the spinning when it becomes the 1st spinner (header of the MCS queue).
The granularity among SMT h/w threads in the same core could be "byte"
which the Load-Store Unit (LSU) inside the core handles. Making the 1st
spinner only spin on locked_pending bits (not the whole qspinlock) can
avoid the false dependency between the tail field and the locked_pending
field. So this micro-optimization helps the h/w thread (the 1st spinner)
stay in a low power state and prevents it from being woken up by other
h/w threads in the same core when they perform xchg_tail() to update
the tail field. Please see a similar discussion in the link .
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx>
kernel/locking/qspinlock.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index efebbf19f887..e7b990b28610 100644
@@ -513,7 +513,20 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
if ((val = pv_wait_head_or_lock(lock, node)))
+#if _Q_PENDING_BITS == 8
+ * Spinning on the 2-byte locked_pending instead of the 4-byte qspinlock
+ * variable can avoid the false dependency between the tail field and
+ * the locked_pending field. This helps the h/w thread (the 1st spinner)
+ * stay in a low power state and prevents it from being woken up by other
+ * h/w threads in the same core when they perform xchg_tail() to update
+ * the tail field only.
+ smp_cond_load_acquire(&lock->locked_pending, !VAL);
+ val = atomic_read_acquire(&lock->val);
val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));