Re: [PATCH] locking/pvqspinlock: Hybrid PV queued/unfair locks

From: Waiman Long
Date: Tue Nov 07 2017 - 14:48:20 EST


On 11/07/2017 02:41 PM, Peter Zijlstra wrote:
> On Tue, Nov 07, 2017 at 11:39:02AM -0500, Waiman Long wrote:
>> On 11/07/2017 07:58 AM, Peter Zijlstra wrote:
>>> On Fri, Nov 03, 2017 at 11:35:31AM -0400, Waiman Long wrote:
>>>> This patch combines the best attributes of an unfair lock and a
>>>> pvqspinlock into a hybrid lock with 2 modes - queued mode & unfair
>>>> mode. A lock waiter goes into the unfair mode when there are waiters
>>>> in the wait queue but the pending bit isn't set. Otherwise, it will
>>>> go into the queued mode waiting in the queue for its turn.
>>> You forgot to test a starvation case. And you also forgot to describe
>>> how they cannot happen. I really cannot remember how all this is
>>> supposed to work.
>> Lock starvation shouldn't happen. The pending bit is set by the queue
>> head to indicate its readiness before spinning on the lock. Once the
>> pending bit is made visible to all the CPUs, no one can steal the lock
>> and they will all queued up in the wait queue.
> So the queue path of queued_spin_lock_slowpath() does
> queued_spin_trylock() which, for PV, ends up being that
> pv_queued_spin_steal_lock(), which you changed to spin util PENDING.
>
> Now PENDING is set by pv_wait_head_or_lock(), but that is far after
> queued_spin_trylock().
>
> The way I'm reading this is that we'll never get there... because we'll
> all be spinning in pv_queued_spin_steal_lock().
>
> So how will we fail pv_queued_spin_steal_lock() in order to then end up
> in pv_wait_head_or_lock() to set PENDING such that
> pv_queued_spin_steal_lock() will fail?
>
> The comment with your new pv_queued_spin_steal_lock() should very
> clearly spell out how this works.

I did add a comment to say that the lock waiter will stay in unfair mode
only when there are WAITERS and the pending bit is not set. So if no CPU
is in the wait queue, it will just do one trylock as is before the patch
and move on to the regular slowpath. I will update the comments to
clarify that I mean waiters in the wait queue.

>
>> I ran my locking microbenchmark on a 2-socket 36-core system (HT off)
>> with # of locking threads equals to the number of vCPUs in a kvm VM. The
>> table below show the the min/mean/max per-thread locking ops done in 5
>> seconds:
>>
>> #vCPUs min/avg/max lockops
>> 36 822,135/881,063/950,363
>> 54 542,435/581,664/625,937
>> 72 397,500/428,177/499,299
>>
>> It is obvious that there was no lock starvation here.
> It is however not obvious if that is the worst case; at the very least
> you should compare to the test-and-set variant which has known
> starvation. If this test doesn't show starvation with the test-and-set
> then this test is bad.

Sure, I will add the unfair lock case for comparison.

Cheers,
Longman