Re: [tip:locking/core] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

From: Will Deacon
Date: Fri Sep 10 2021 - 07:08:38 EST


Hi Paul,

On Thu, Sep 09, 2021 at 10:46:35AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 09, 2021 at 02:35:36PM +0100, Will Deacon wrote:
> > On Thu, Sep 09, 2021 at 09:25:30AM +0200, Peter Zijlstra wrote:
> > > On Wed, Sep 08, 2021 at 09:08:33AM -0700, Linus Torvalds wrote:
> > > > then I think it's entirely reasonable to
> > > >
> > > > spin_unlock(&r);
> > > > spin_lock(&s);
> > > >
> > > > cannot be reordered.
> > >
> > > I'm obviously completely in favour of that :-)
> >
> > I don't think we should require the accesses to the actual lockwords to
> > be ordered here, as it becomes pretty onerous for relaxed LL/SC
> > architectures where you'd end up with an extra barrier either after the
> > unlock() or before the lock() operation. However, I remain absolutely in
> > favour of strengthening the ordering of the _critical sections_ guarded by
> > the locks to be RCsc.
>
> If by this you mean the critical sections when observed only by other
> critical sections for a given lock, then everyone is already there.

No, I mean the case where somebody without the lock (but using memory
barriers) can observe the critical sections out of order (i.e. W -> R
order is not maintained).

> However...
>
> > Last time this came up, I think the RISC-V folks were generally happy to
> > implement whatever was necessary for Linux [1]. The thing that was stopping
> > us was Power (see CONFIG_ARCH_WEAK_RELEASE_ACQUIRE), wasn't it? I think
> > Michael saw quite a bit of variety in the impact on benchmarks [2] across
> > different machines. So the question is whether newer Power machines are less
> > affected to the degree that we could consider making this change again.
>
> Last I knew, on Power a pair of critical sections for a given lock could
> be observed out of order (writes from the earlier critical section vs.
> reads from the later critical section), but only by CPUs not holding
> that lock. Also last I knew, tightening this would require upgrading
> some of the locking primitives' lwsync instructions to sync instructions.
> But I know very little about Power 10.

Yup, that's the one. This is the primary reason why we have the confusing
"RCtso" model today so this is my periodic "Do we still need this?" poking
for the Power folks :)

If the SYNC is a disaster for Power, then I'll ask again in another ~3 years
time in the hope that newer micro-architectures can swallow the instruction
more easily, but the results last time weren't hugely compelling and so _if_
there's an opportunity to make locking more "obvious" then I'm all for it.

Will