Re: [PATCH 1/4] spinlock: Document memory barrier rules

From: Peter Zijlstra
Date: Mon Aug 29 2016 - 09:44:35 EST

On Mon, Aug 29, 2016 at 02:54:54PM +0200, Manfred Spraul wrote:
> Hi Peter,
> On 08/29/2016 12:48 PM, Peter Zijlstra wrote:
> >On Sun, Aug 28, 2016 at 01:56:13PM +0200, Manfred Spraul wrote:
> >>Right now, the spinlock machinery tries to guarantee barriers even for
> >>unorthodox locking cases, which ends up as a constant stream of updates
> >>as the architectures try to support new unorthodox ideas.
> >>
> >>The patch proposes to reverse that:
> >>spin_lock is ACQUIRE, spin_unlock is RELEASE.
> >>spin_unlock_wait is also ACQUIRE.
> >>Code that needs further guarantees must use appropriate explicit barriers.
> >>
> >>Architectures that can implement some barriers for free can define the
> >>barriers as NOPs.
> >>
> >>As the initial step, the patch converts ipc/sem.c to the new defines:
> >>- no more smp_rmb() after spin_unlock_wait(), that is part of
> >> spin_unlock_wait()
> >>- smp_mb__after_spin_lock() instead of a direct smp_mb().
> >>
> >Why? This does not explain why..
> Which explanation is missing?
> - removal of the smb_rmb() after spin_unlock_wait?

So that should have been a separate patch. This thing doing two things
is wrong too. But no, this I get. I did make spin_unlock_wait() an
ACQUIRE after all.

> - Why smp_mb is required after spin_lock? See Patch 02, I added the race
> that exists on real hardware.
> Exactly the same issue exists for sem.c
> - Why introduce a smp_mb__after_spin_lock()?
> The other options would be:
> - same as RCU, i.e. add CONFIG_PPC into sem.c and nf_contrack_core.c
> - overhead for all archs by added an unconditional smp_mb()

See, this too doesn't adequately explain the situation, since all refers
to other sources.

If you add a barrier, the Changelog had better be clear. And I'm still
not entirely sure I get what exactly this barrier should do, nor why it
defaults to a full smp_mb. If what I suspect it should do, only PPC and
ARM64 need the barrier.

And x86 doesn't need it -- _however_ it would need it if you require
full smp_mb semantics, which I suspect you don't.

Which brings us back to a very poor definition of what this barrier
should be doing.