[PATCH] locking/osq_lock: Minimize access to node->prev's cacheline

From: Waiman Long
Date: Wed Dec 20 2023 - 20:20:31 EST


When CONFIG_PARAVIRT_SPINLOCKS is on, osq_lock() will spin on both
node's and node->prev's cachelines while waiting for its node->locked
value to be set. That can cause cacheline contention with another
CPU modifying the node->prev's cacheline. The reason for accessing
node->prev's cacheline is to read the node->prev->cpu value as a
parameter value for the vcpu_is_preempted() call. However this value
is a constant as long as node->prev doesn't change.

Minimize that contention by caching the node->prev and node->prev->cpu
values and updating the cached values and accessing node->prev's
cacheline iff node->prev changes.

By running an osq_lock/unlock microbenchmark which repeatedly calls
osq_lock/unlock in a loop with 96 locking threads running on a 2-socket
96-CPU machine, the locking rates before and after this patch were:

Before patch: 1701 kops/s
After patch: 3052 kops/s
% Change: +79.4%

It can be seen that this patch enables a rather significant performance
boost to the OSQ lock.

Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
kernel/locking/osq_lock.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index d5610ad52b92..6d7218f4b6e4 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -87,12 +87,29 @@ osq_wait_next(struct optimistic_spin_queue *lock,
return next;
}

+#ifndef vcpu_is_preempted
+#define prev_vcpu_is_preempted(n, p, c) false
+#else
+static inline bool prev_vcpu_is_preempted(struct optimistic_spin_node *node,
+ struct optimistic_spin_node **pprev,
+ int *ppvcpu)
+{
+ struct optimistic_spin_node *prev = READ_ONCE(node->prev);
+
+ if (prev != *pprev) {
+ *pprev = prev;
+ *ppvcpu = node_cpu(prev);
+ }
+ return vcpu_is_preempted(*ppvcpu);
+}
+#endif
+
bool osq_lock(struct optimistic_spin_queue *lock)
{
struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
struct optimistic_spin_node *prev, *next;
int curr = encode_cpu(smp_processor_id());
- int old;
+ int old, pvcpu;

node->locked = 0;
node->next = NULL;
@@ -110,6 +127,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)

prev = decode_cpu(old);
node->prev = prev;
+ pvcpu = node_cpu(prev);

/*
* osq_lock() unqueue
@@ -141,7 +159,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
* polling, be careful.
*/
if (smp_cond_load_relaxed(&node->locked, VAL || need_resched() ||
- vcpu_is_preempted(node_cpu(node->prev))))
+ prev_vcpu_is_preempted(node, &prev, &pvcpu)))
return true;

/* unqueue */
--
2.39.3