Re: [tip:locking/core] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

From: Paul E. McKenney
Date: Thu Sep 09 2021 - 13:46:39 EST


[+ Nick Piggin]

On Thu, Sep 09, 2021 at 02:35:36PM +0100, Will Deacon wrote:
> [+Palmer, PaulW, Daniel and Michael]
>
> On Thu, Sep 09, 2021 at 09:25:30AM +0200, Peter Zijlstra wrote:
> > On Wed, Sep 08, 2021 at 09:08:33AM -0700, Linus Torvalds wrote:
> >
> > > So if this is purely a RISC-V thing,
> >
> > Just to clarify, I think the current RISC-V thing is stonger than
> > PowerPC, but maybe not as strong as say ARM64, but RISC-V memory
> > ordering is still somewhat hazy to me.
> >
> > Specifically, the sequence:
> >
> > /* critical section s */
> > WRITE_ONCE(x, 1);
> > FENCE RW, W
> > WRITE_ONCE(s.lock, 0); /* store S */
> > AMOSWAP %0, 1, r.lock /* store R */
> > FENCE R, RW
> > WRITE_ONCE(y, 1);
> > /* critical section r */
> >
> > fully separates section s from section r, as in RW->RW ordering
> > (possibly not as strong as smp_mb() though), while on PowerPC it would
> > only impose TSO ordering between sections.
> >
> > The AMOSWAP is a RmW and as such matches the W from the RW->W fence,
> > similarly it marches the R from the R->RW fence, yielding an:
> >
> > RW-> W
> > RmW
> > R ->RW
> >
> > ordering. It's the stores S and R that can be re-ordered, but not the
> > sections themselves (same on PowerPC and many others).
> >
> > Clarification from a RISC-V enabled person would be appreciated.
> >
> > > then I think it's entirely reasonable to
> > >
> > > spin_unlock(&r);
> > > spin_lock(&s);
> > >
> > > cannot be reordered.
> >
> > I'm obviously completely in favour of that :-)
>
> I don't think we should require the accesses to the actual lockwords to
> be ordered here, as it becomes pretty onerous for relaxed LL/SC
> architectures where you'd end up with an extra barrier either after the
> unlock() or before the lock() operation. However, I remain absolutely in
> favour of strengthening the ordering of the _critical sections_ guarded by
> the locks to be RCsc.

If by this you mean the critical sections when observed only by other
critical sections for a given lock, then everyone is already there.

However...

> Last time this came up, I think the RISC-V folks were generally happy to
> implement whatever was necessary for Linux [1]. The thing that was stopping
> us was Power (see CONFIG_ARCH_WEAK_RELEASE_ACQUIRE), wasn't it? I think
> Michael saw quite a bit of variety in the impact on benchmarks [2] across
> different machines. So the question is whether newer Power machines are less
> affected to the degree that we could consider making this change again.

Last I knew, on Power a pair of critical sections for a given lock could
be observed out of order (writes from the earlier critical section vs.
reads from the later critical section), but only by CPUs not holding
that lock. Also last I knew, tightening this would require upgrading
some of the locking primitives' lwsync instructions to sync instructions.
But I know very little about Power 10.

Adding Nick on CC for his thoughts.

Thanx, Paul

> Will
>
> [1] https://lore.kernel.org/lkml/11b27d32-4a8a-3f84-0f25-723095ef1076@xxxxxxxxxx/
> [2] https://lore.kernel.org/lkml/87tvp3xonl.fsf@xxxxxxxxxxxxxxxxxxxxxxxx/