Re: [RFC][PATCH 0/3] locking/qspinlock: Improve determinism for x86

From: Waiman Long
Date: Wed Sep 26 2018 - 19:21:54 EST


On 09/26/2018 01:51 PM, Peter Zijlstra wrote:
> On Wed, Sep 26, 2018 at 12:20:09PM -0400, Waiman Long wrote:
>> On 09/26/2018 07:01 AM, Peter Zijlstra wrote:
>>> Back when Will did his qspinlock determinism patches, we were left with one
>>> cmpxchg loop on x86 due to the use of atomic_fetch_or(). Will proposed a nifty
>>> trick:
>>>
>>> http://lkml.kernel.org/r/20180409145409.GA9661@xxxxxxx
>>>
>>> But at the time we didn't pursue it. This series implements that and argues for
>>> its correctness. In particular it places an smp_mb__after_atomic() in
>>> between the two operations, which forces the load to come after the
>>> store (which is free on x86 anyway).
>>>
>>> In particular this ordering ensures a concurrent unlock cannot trigger
>>> the uncontended handoff. Also it ensures that if the xchg() happens
>>> after a (successful) trylock, we must observe that LOCKED bit.
>> When you said "concurrent unlock cannot trigger the uncontended
>> handoff", are you saying the current code has this uncontended handoff
>> problem or just when comparing to doing a load first followed by xchg()?
> In particular I'm talking about this:
>
> locked:
> /*
> * claim the lock:
> *
> * n,0,0 -> 0,0,1 : lock, uncontended
> * *,0,0 -> *,0,1 : lock, contended
> *
> * If the queue head is the only one in the queue (lock value == tail)
> * and nobody is pending, clear the tail code and grab the lock.
> * Otherwise, we only need to grab the lock.
> */
> /*
> * In the PV case we might already have _Q_LOCKED_VAL set, because
> * of lock stealing; therefore we must also allow:
> *
> * n,0,1 -> 0,0,1
> *
> * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the
> * above wait condition, therefore any concurrent setting of
> * PENDING will make the uncontended transition fail.
> */
> if ((val & _Q_TAIL_MASK) == tail) {
> if (atomic_try_cmpxchg_relaxed(&lock->val, &val, _Q_LOCKED_VAL))
> goto release; /* No contention */
> }
>
>
> Here queue head acquires the lock and we take the "uncontended" path. If
> we set PENDING prior to this, the cmpxchg above will fail, and the later
> load will observe the tail.

Thanks for the clarification. I was wondering if you are talking about a
latent bug in the code. So this is not the case here.

Cheers,
Longman