On Fri, Apr 18, 2014 at 01:32:47PM -0400, Waiman Long wrote:
On 04/18/2014 04:15 AM, Peter Zijlstra wrote:Its not the lock cacheline, you just touched the per-cpu node cacheline
On Thu, Apr 17, 2014 at 05:28:17PM -0400, Waiman Long wrote:For spin_lock(), the lock cacheline is touched by a cmpxchg(). It can takes
On 04/17/2014 11:49 AM, Peter Zijlstra wrote:But you did that read _before_ you touched a cold cacheline, that's 100s
On Thu, Apr 17, 2014 at 11:03:56AM -0400, Waiman Long wrote:That is not true. I pass in a pointer to val to trylock_pending() (the
@@ -192,36 +220,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)But you just did a potentially very expensive op; @val isn't
node->next = NULL;
/*
+ * We touched a (possibly) cold cacheline; attempt the trylock once
+ * more in the hope someone let go while we weren't watching as long
+ * as no one was queuing.
*/
+ if (!(val& _Q_TAIL_MASK)&& queue_spin_trylock(lock))
+ goto release;
representative anymore!
pointer thing) so that it will store the latest value that it reads from the
lock back into val. I did miss one in the PV qspinlock exit loop. I will add
it back when I do the next version.
of cycles. Whatever value you read back then is now complete nonsense.
100s of cycles whether it is hot or cold.
for the first time, setting up the node.