Re: [RESEND PATCH v5] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs
From: Paul E. McKenney
Date: Thu Aug 10 2017 - 16:49:53 EST
On Thu, Aug 10, 2017 at 11:13:17AM +0200, Peter Zijlstra wrote:
> On Thu, Aug 10, 2017 at 04:12:13PM +0800, Boqun Feng wrote:
>
> > > Or is the reason this doesn't work on PPC that its RCpc?
>
> So that :-)
>
> > Here is an example why PPC needs a sync() before the cmpxchg():
> >
> > https://marc.info/?l=linux-kernel&m=144485396224519&w=2
> >
> > and Paul Mckenney's detailed explanation about why this could happen:
> >
> > https://marc.info/?l=linux-kernel&m=144485909826241&w=2
> >
> > (Somehow, I feel like he was answering to a similar question question as
> > you ask here ;-))
>
> Yes, and I had vague memories of having gone over this before, but
> couldn't quickly find things. Thanks!
>
> > And I think aarch64 doesn't have a problem here because it is "(other)
> > multi-copy atomic". Will?
>
> Right, its the RCpc vs RCsc thing. The ARM64 release is as you say
> multi-copy atomic, whereas the PPC lwsync is not.
>
> This still leaves us with the situation that we need an smp_mb() between
> smp_store_release() and a possibly failing cmpxchg() if we want to
> guarantee the cmpxchg()'s load comes after the store-release.
For whatever it is worth, this is why C11 allows specifying one
memory-order strength for the success case and another for the failure
case. But it is not immediately clear that we need another level
of combinatorial API explosion...
Thanx, Paul