Re: [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning

From: Waiman Long
Date: Fri Jul 15 2016 - 12:45:06 EST

On 07/15/2016 03:45 AM, Wanpeng Li wrote:
2016-07-15 15:09 GMT+08:00 Peter Zijlstra<peterz@xxxxxxxxxxxxx>:
On Fri, Jul 15, 2016 at 05:26:40AM +0800, Wanpeng Li wrote:
2016-07-14 22:52 GMT+08:00 Waiman Long<waiman.long@xxxxxxx>:
As pv_kick_node() is called immediately after designating the next node as
the queue head, the chance of this racing is possible, but is not likely
unless the lock holder vCPU gets preempted for a long time at that right
moment. This change does not do any harm though, so I am OK with that.
However, I do want you to add a comment about the possible race in the code
as it isn't that obvious or likely.
How about something like:

* If the lock holder vCPU gets preempted for a long time, pv_kick_node will
* advance its state and hash the lock, restore/set the vcpu_hashed state to
* avoid the race.
So I'm not sure. Yes it was a bug, but its fairly 'obvious' it should be
I believe Waiman can give a better comments. :)

Yes, setting the state to vcpu_hashed is the more obvious choice. What I said is not obvious is that there can be a race between the new lock holder in pv_kick_node() and the new queue head trying to call pv_wait(). And it is what I want to document it. Maybe something more graphical can help:

* lock holder vCPU queue head vCPU
* ---------------- ---------------
* node->locked = 1;
* <preemption> READ_ONCE(node->locked)
* ... pv_wait_head_or_lock():
* pv_hash();
* lock->locked = _Q_SLOW_VAL;
* node->state = vcpu_hashed;
* pv_kick_node():
* cmpxchg(node->state,
* vcpu_halted, vcpu_hashed);
* lock->locked = _Q_SLOW_VAL;
* pv_hash();
* With preemption at the right moment, it is possible that both the
* lock holder and queue head vCPUs can be racing to set node->state.
* Making sure the state is never set to vcpu_halted will prevent this
* racing from happening.