On Thu, Nov 05, 2015 at 11:06:48AM -0500, Waiman Long wrote:
This is all good information to have in the Changelog. And since theseHow does it affect IVB-EX (which you were testing earlier IIRC)?My testing on IVB-EX indicated that if the critical section is really short,
the change may actually slow thing a bit in some cases. However, when the
critical section is long enough that the prefetch overhead can be hidden
within the lock acquisition loop, there will be a performance boost.
Observing next implies val != tail, but the reverse may not be true. The@@ -426,6 +437,15 @@ queue:This however appears an independent optimization. Is it worth it? Would
cpu_relax();
/*
+ * If the next pointer is defined, we are not tail anymore.
+ * In this case, claim the spinlock& release the MCS lock.
+ */
+ if (next) {
+ set_locked(lock);
+ goto mcs_unlock;
+ }
+
+ /*
* claim the lock:
*
* n,0,0 -> 0,0,1 : lock, uncontended
@@ -458,6 +478,7 @@ queue:
while (!(next = READ_ONCE(node->next)))
cpu_relax();
+mcs_unlock:
arch_mcs_spin_unlock_contended(&next->locked);
pv_kick_node(lock, next);
we not already have observed a val != tail in this case? At which point
we're just adding extra code for no gain.
That is, if we observe @next, must we then not also observe val != tail?
branch is done before we observe val != tail. Yes, it is an optimization to
avoid reading node->next again if we have already observed next. I have
observed a very minor performance boost with that change without the
prefetch.
are two independent changes, two patches would have been the right
format.