Re: [PATCH v3 next 5/5] Avoid writing to node->next in the osq_lock() fast path
From: David Laight
Date: Sat Mar 07 2026 - 06:33:17 EST
On Fri, 6 Mar 2026 16:06:25 -0800
Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Fri, 6 Mar 2026 at 14:52, <david.laight.linux@xxxxxxxxx> wrote:
> >
> > From: David Laight <david.laight.linux@xxxxxxxxx>
> >
> > When osq_lock() returns false or osq_unlock() returns static
> > analysis shows that node->next should always be NULL.
>
> This explanation makes me nervous.
>
> *What* static analysis? It's very unclear. And the "should be NULL"
> doesn't make me get the warm and fuzzies.
The analysis was 'my brain' about two years ago :-)
> For example, osq_unlock() does do
>
> node = this_cpu_ptr(&osq_node);
> next = xchg(&node->next, NULL);
>
> so it's clearly NULL after that. But it's not obvious this will be
> reached, because osq_unlock() does that
>
> /*
> * Fast path for the uncontended case.
> */
> if (atomic_try_cmpxchg_release(&lock->tail, &curr, OSQ_UNLOCKED_VAL))
> return;
>
> before it actually gets to this point.
That is (should be) checking that the list only contains a single node.
So the 'next' pointer can't be set.
I'll drink 10 double-espressos and read the code again.
I'll also check what happens if the code is hit by an ethernet interrupt
just after the initial xchg().
Other cpu can also acquire the lock and then get preempted (so need to
unlink themselves) as well as the holder trying to release the lock.
It might be that the initial xchg needs to be a cmpxchg - which would
massively simplify the 'lock' path.
It does have the 'no forwards progress' issue.
I can't remember whether xchg has to be implemented as cmpxchg on any
architectures, if it does then the current complex locking code
is unnecessary.
David
>
> And yes, I'm very willing to believe that if we hit that fast-path,
> node->next (which is "curr->next" in that path) is indeed NULL, but I
> think this commit message really needs to spell it all out.
>
> No "should be NULL", in other words. I want a rock-solid "node->next
> is always NULL because XYZ" explanation, not a wishy-washy "static
> analysis says" without spelling it out.
>
> Linus