Re: [RESEND PATCH v3] locking/pvqspinlock: Add lock holder CPU argument to pv_wait()

From: Waiman Long
Date: Thu Apr 28 2016 - 16:50:43 EST


On 04/28/2016 09:32 AM, Boqun Feng wrote:
Hi Waiman,

On Thu, Apr 21, 2016 at 11:02:05AM -0400, Waiman Long wrote:
Pan Xinhui was asking for a lock holder cpu argument in pv_wait()
to help the porting of pvqspinlock to PPC. The new argument will can
help hypervisor expediate the execution of the critical section by
the lock holder, if its vCPU isn't running, so that it can release
the lock sooner.

The pv_wait() function is now extended to have a third argument
that holds the vCPU number of the lock holder, if this is
known. Architecture specific hypervisor code can then make use of
that information to make better decision of which vCPU should be
running next.

This patch also adds a new field in the pv_node structure to hold
the vCPU number of the previous queue entry. For the queue head,
the prev_cpu entry is likely to be the that of the lock holder,
though lock stealing may render this information inaccurate.

In pv_wait_head_or_lock(), pv_wait() will now be called with that
vCPU number as it is likely to be the lock holder.

In pv_wait_node(), the newly added pv_lookup_hash() function will
be called to look up the queue head and pass in the lock holder vCPU
number stored there, if found.

Suggested-by: Pan Xinhui<xinhui@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Waiman Long<Waiman.Long@xxxxxxx>
---

v2->v3:
- Rephrase the changelog and some of the comments to make the
intention of this patch more clear.
- Make pv_lookup_hash() architecture dependent to eliminate its cost
if an architecture doesn't need it. Also make the number of
cachelines searched in pv_lookup_hash() to between 1-4.
- [RESEND] Fix typo error.

v1->v2:
- In pv_wait_node(), lookup the hash table for the queue head and pass
the lock holder cpu number to pv_wait().

[snip]

@@ -282,7 +328,8 @@ static void pv_init_node(struct mcs_spinlock *node)
* pv_kick_node() is used to set _Q_SLOW_VAL and fill in hash table on its
* behalf.
*/
-static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
+static void pv_wait_node(struct qspinlock *lock, struct mcs_spinlock *node,
+ struct mcs_spinlock *prev)
{
struct pv_node *pn = (struct pv_node *)node;
struct pv_node *pp = (struct pv_node *)prev;
@@ -290,6 +337,8 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
int loop;
bool wait_early;

+ pn->prev_cpu = pp->cpu; /* Save vCPU # of previous queue entry */
+
/* waitcnt processing will be compiled out if !QUEUED_LOCK_STAT */
for (;; waitcnt++) {
for (wait_early = false, loop = SPIN_THRESHOLD; loop; loop--) {
@@ -314,10 +363,23 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
smp_store_mb(pn->state, vcpu_halted);

if (!READ_ONCE(node->locked)) {
+ struct pv_node *ph;
+
qstat_inc(qstat_pv_wait_node, true);
qstat_inc(qstat_pv_wait_again, waitcnt);
qstat_inc(qstat_pv_wait_early, wait_early);
- pv_wait(&pn->state, vcpu_halted);
+
+ /*
+ * If the current queue head is in the hash table,
+ * the prev_cpu field of its pv_node may contain the
+ * vCPU # of the lock holder. However, lock stealing
+ * may make that information inaccurate. Anyway, we
+ * look up the hash table to try to get the lock
+ * holder vCPU number.
+ */
+ ph = pv_lookup_hash(lock);
I am thinking, could we save the hash lookup here if wait_early == true?
Because in that case, the previous node is likely to get the lock soon,
it makes sense we yield to that node rather than the holder, so may we
can do something like:

if (wait_early)
pv_wait(&pn->state, vcpu_halted, pn->prev_cpu);
else {
ph = pv_lookup_hash(lock);
pv_wait(&pn->state, vcpu_halted,
ph ? ph->prev_cpu : -1);
}

Thoughts?

Regards,
Boqun


One reason why I don't want to just wake up the previous node vCPU is because I am concerned that a vCPU may go into a suspend-wakeup loop. So I will wait until we get experimental evidence that this really help performance before we commit to this kind of change.

Cheers,
Longman