Re: [PATCH 02/10] locking/qspinlock: Remove unbounded cmpxchg loop from locking slowpath
From: Will Deacon
Date: Mon Apr 09 2018 - 13:19:50 EST
On Mon, Apr 09, 2018 at 05:54:20PM +0200, Peter Zijlstra wrote:
> On Mon, Apr 09, 2018 at 03:54:09PM +0100, Will Deacon wrote:
> > @@ -289,18 +315,26 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> > return;
> >
> > /*
> > - * If we observe any contention; queue.
> > + * If we observe queueing, then queue ourselves.
> > */
> > - if (val & ~_Q_LOCKED_MASK)
> > + if (val & _Q_TAIL_MASK)
> > goto queue;
> >
> > /*
> > + * We didn't see any queueing, so have one more try at snatching
> > + * the lock in case it became available whilst we were taking the
> > + * slow path.
> > + */
> > + if (queued_spin_trylock(lock))
> > + return;
> > +
> > + /*
> > * trylock || pending
> > *
> > * 0,0,0 -> 0,0,1 ; trylock
> > * 0,0,1 -> 0,1,1 ; pending
> > */
> > + val = set_pending_fetch_acquire(lock);
> > if (!(val & ~_Q_LOCKED_MASK)) {
>
> So, if I remember that partial paper correctly, the atomc_read_acquire()
> can see 'arbitrary' old values for everything except the pending byte,
> which it just wrote and will fwd into our load, right?
>
> But I think coherence requires the read to not be older than the one
> observed by the trylock before (since it uses c-cas its acquire can be
> elided).
>
> I think this means we can miss a concurrent unlock vs the fetch_or. And
> I think that's fine, if we still see the lock set we'll needlessly 'wait'
> for it go become unlocked.
Ah, but there is a related case that doesn't work. If the lock becomes
free just before we set pending, then another CPU can succeed on the
fastpath. We'll then set pending, but the lockword we get back may still
have the locked byte of 0, so two people end up holding the lock.
I think it's worth giving this a go with the added trylock, but I can't
see a way to avoid the atomic_fetch_or at the moment.
Will