Re: [PATCH v2] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs

From: Boqun Feng
Date: Tue Feb 07 2017 - 23:10:43 EST


On Wed, Feb 08, 2017 at 11:39:10AM +0800, Xinhui Pan wrote:
> 2016-12-26 4:26 GMT+08:00 Waiman Long <longman@xxxxxxxxxx>:
>
> > A number of cmpxchg calls in qspinlock_paravirt.h were replaced by more
> > relaxed versions to improve performance on architectures that use LL/SC.
> >
> > All the locking related cmpxchg's are replaced with the _acquire
> > variants:
> > - pv_queued_spin_steal_lock()
> > - trylock_clear_pending()
> >
> > The cmpxchg's related to hashing are replaced by either by the _release
> > or the _relaxed variants. See the inline comment for details.
> >
> > Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
> >
> > v1->v2:
> > - Add comments in changelog and code for the rationale of the change.
> >
> > ---
> > kernel/locking/qspinlock_paravirt.h | 50 ++++++++++++++++++++++++------
> > -------
> > 1 file changed, 33 insertions(+), 17 deletions(-)
> >
> >
> > @@ -323,8 +329,14 @@ static void pv_wait_node(struct mcs_spinlock *node,
> > struct mcs_spinlock *prev)
> > * If pv_kick_node() changed us to vcpu_hashed, retain that
> > * value so that pv_wait_head_or_lock() knows to not also
> > try
> > * to hash this lock.
> > + *
> > + * The smp_store_mb() and control dependency above will
> > ensure
> > + * that state change won't happen before that.
> > Synchronizing
> > + * with pv_kick_node() wrt hashing by this waiter or by the
> > + * lock holder is done solely by the state variable. There
> > is
> > + * no other ordering requirement.
> > */
> > - cmpxchg(&pn->state, vcpu_halted, vcpu_running);
> > + cmpxchg_relaxed(&pn->state, vcpu_halted, vcpu_running);
> >
> > /*
> > * If the locked flag is still not set after wakeup, it is
> > a
> > @@ -360,9 +372,12 @@ static void pv_kick_node(struct qspinlock *lock,
> > struct mcs_spinlock *node)
> > * pv_wait_node(). If OTOH this fails, the vCPU was running and
> > will
> > * observe its next->locked value and advance itself.
> > *
> > - * Matches with smp_store_mb() and cmpxchg() in pv_wait_node()
> > + * Matches with smp_store_mb() and cmpxchg_relaxed() in
> > pv_wait_node().
> > + * A release barrier is used here to ensure that node->locked is
> > + * always set before changing the state. See comment in
> > pv_wait_node().
> > */
> > - if (cmpxchg(&pn->state, vcpu_halted, vcpu_hashed) != vcpu_halted)
> > + if (cmpxchg_release(&pn->state, vcpu_halted, vcpu_hashed)
> > + != vcpu_halted)
> > return;
> >
> > hi, Waiman
> We can't use _release here, a full barrier is needed.
>
> There is pv_kick_node vs pv_wait_head_or_lock
>
> [w] l->locked = _Q_SLOW_VAL //reordered here
>
> if (READ_ONCE(pn->state) == vcpu_hashed) //False.
>
> lp = (struct qspinlock **)1;
>
> [STORE] pn->state = vcpu_hashed lp = pv_hash(lock,
> pn);
> pv_hash() if
> (xchg(&l->locked, _Q_SLOW_VAL) == 0) // fasle, not unhashed.
>

This analysis is correct, but..

> Then the same lock has hashed twice but only unhashed once. So at last as
> the hash table grows big, we hit RCU stall.
>
> I hit RCU stall when I run netperf benchmark
>

how will a big hash table hit RCU stall? Do you have the call trace for
your RCU stall?

Regards,
Boqun

> thanks
> xinhui
>
>
> > --
> > 1.8.3.1
> >
> >

Attachment: signature.asc
Description: PGP signature