Re: [PATCH v3 next 5/5] Avoid writing to node->next in the osq_lock() fast path

From: Waiman Long

Date: Wed Mar 11 2026 - 15:27:28 EST


On 3/6/26 5:51 PM, david.laight.linux@xxxxxxxxx wrote:
From: David Laight <david.laight.linux@xxxxxxxxx>

When osq_lock() returns false or osq_unlock() returns static
analysis shows that node->next should always be NULL.
This means that it isn't necessary to explicitly set it to NULL
prior to atomic_xchg(&lock->tail, curr) on entry to osq_lock().

Defer determining the address of the CPU's 'node' until after the
atomic_exchange() so that it isn't done in the uncontented path.

Signed-off-by: David Laight <david.laight.linux@xxxxxxxxx>
---
kernel/locking/osq_lock.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 0619691e2756..3f0cfdf1cd0f 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -92,13 +92,10 @@ osq_wait_next(struct optimistic_spin_queue *lock,
bool osq_lock(struct optimistic_spin_queue *lock)
{
- struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
- struct optimistic_spin_node *prev, *next;
+ struct optimistic_spin_node *node, *prev, *next;
unsigned int curr = encode_cpu(smp_processor_id());
unsigned int prev_cpu;
- node->next = NULL;

Although it does look like node->next should always be NULL when entering osq_lock(), any future change may invalidate this assumption. I know you want to not touch the osq_node cacheline on fast path, but we will need a big comment here to explicitly spell out this assumption to make sure that we won't break it in the future.

BTW, how much performance gain have you measured with this change? Can we just leave it there to be safe.

-
/*
* We need both ACQUIRE (pairs with corresponding RELEASE in
* unlock() uncontended, or fastpath) and RELEASE (to publish
@@ -109,6 +106,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
if (prev_cpu == OSQ_UNLOCKED_VAL)
return true;
+ node = this_cpu_ptr(&osq_node);
WRITE_ONCE(node->prev_cpu, prev_cpu);
prev = decode_cpu(prev_cpu);
node->locked = 0;

I am fine with moving the initialization here. The other patches also look good to me.

Cheers,
Longman