On Wed, 2015-07-22 at 16:12 -0400, Waiman Long wrote:
Like the native qspinlock, using the pending bit when it is lightlyCan we infer that this new spin threshold is the metric to detect these
loaded to acquire the lock is faster than going through the PV queuing
process which is even slower than the native queuing process. It also
avoids loading two additional cachelines (the MCS and PV nodes).
This patch adds the pending bit support for PV qspinlock. The pending
bit code has a smaller spin threshold (1<<10). It will default back
to the queuing method if it cannot acquired the lock within a certain
time limit.
"light loads"? If so, I cannot help but wonder if there is some more
straightforward/ad-hoc way of detecting this, ie some pv_<> function.
That would also save a lot of time as it would not be time based.
Although it might be a more costly call altogether, I dunno.
Some comments about this 'loop' threshold.
+static int pv_pending_lock(struct qspinlock *lock, u32 val)So I'm not clear about the semantics of what (should) occurs when the
+{
+ int loop = PENDING_SPIN_THRESHOLD;
+ u32 new, old;
+
+ /*
+ * wait for in-progress pending->locked hand-overs
+ */
+ if (val == _Q_PENDING_VAL) {
+ while (((val = atomic_read(&lock->val)) == _Q_PENDING_VAL)&&
+ loop--)
+ cpu_relax();
+ }
+
+ /*
+ * trylock || pending
+ */
+ for (;;) {
+ if (val& ~_Q_LOCKED_MASK)
+ goto queue;
+ new = _Q_LOCKED_VAL;
+ if (val == new)
+ new |= _Q_PENDING_VAL;
+ old = atomic_cmpxchg(&lock->val, val, new);
+ if (old == val)
+ break;
+ if (loop--<= 0)
+ goto queue;
+ }
threshold is exhausted. In the trylock/pending loop above, you
immediately return 0, indicating we want to queue. Ok, but below:
+... you call clear_pending before returning. Is this intentional? Smells
+ if (new == _Q_LOCKED_VAL)
+ goto gotlock;
+ /*
+ * We are pending, wait for the owner to go away.
+ */
+ while (((val = smp_load_acquire(&lock->val.counter))& _Q_LOCKED_MASK)
+ && (loop--> 0))
+ cpu_relax();
+
+ if (!(val& _Q_LOCKED_MASK)) {
+ clear_pending_set_locked(lock);
+ goto gotlock;
+ }
+ /*
+ * Clear the pending bit and fall back to queuing
+ */
+ clear_pending(lock);
fishy.
And basically afaict all this chunk of code does is spin until loop is
exhausted, and breakout when we got the lock. Ie, something like this is
a lot cleaner:
while (loop--) {
/*
* We are pending, wait for the owner to go away.
*/
val = smp_load_acquire(&lock->val.counter);
if (!(val& _Q_LOCKED_MASK)) {
clear_pending_set_locked(lock);
goto gotlock;
}
cpu_relax();
}
/*
* Clear the pending bit and fall back to queuing
*/
clear_pending(lock);