Re: [PATCH v3] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs
From: Will Deacon
Date: Tue Feb 21 2017 - 08:04:15 EST
On Mon, Feb 20, 2017 at 12:58:39PM +0800, Boqun Feng wrote:
> > So Waiman, the fact is that in this case, we want the following code
> > sequence:
> >
> > CPU 0 CPU 1
> > ================= ====================
> > {pn->state = vcpu_running, node->locked = 0}
> >
> > smp_store_smb(&pn->state, vcpu_halted):
> > WRITE_ONCE(pn->state, vcpu_halted);
> > smp_mb();
> > r1 = READ_ONCE(node->locked);
> > arch_mcs_spin_unlock_contented();
> > WRITE_ONCE(node->locked, 1)
> >
> > cmpxchg(&pn->state, vcpu_halted, vcpu_hashed);
> >
> > never ends up in:
> >
> > r1 == 0 && cmpxchg fail(i.e. the read part of cmpxchg reads the
> > value vcpu_running).
> >
> > We can have such a guarantee if cmpxchg has a smp_mb() before its load
> > part, which is true for PPC. But semantically, cmpxchg() doesn't provide
> > any order guarantee if it fails, which is true on ARM64, IIUC. (Add Will
> > in Cc for his insight ;-)).
I think you're right. The write to node->locked on CPU1 is not required
to be ordered before the load part of the failing cmpxchg.
> > So a possible "fix"(in case ARM64 will use qspinlock some day), would be
> > replace cmpxchg() with smp_mb() + cmpxchg_relaxed().
Peversely, we could actually get away with cmpxchg_acquire on arm64 because
arch_mcs_spin_unlock_contended is smp_store_release and we order release ->
acquire in the architecture. But that just brings up the age old unlock/lock
discussion again...
Will