Re: [PATCH v4 1/3] qspinlock: Introducing a 4-byte queue spinlock implementation

From: Waiman Long
Date: Wed Feb 19 2014 - 14:26:34 EST


On 02/19/2014 03:52 AM, Peter Zijlstra wrote:
On Tue, Feb 18, 2014 at 07:50:13PM -0500, Waiman Long wrote:
On 02/18/2014 04:34 PM, Peter Zijlstra wrote:
On Tue, Feb 18, 2014 at 02:39:31PM -0500, Waiman Long wrote:
The #ifdef is harder to take away here. The point is that doing a 32-bit
exchange may accidentally steal the lock with the additional code to handle
that. Doing a 16-bit exchange, on the other hand, will never steal the lock
and so don't need the extra handling code. I could construct a function with
different return values to handle the different cases if you think it will
make the code easier to read.
Does it really pay to use xchg() with all those fixup cases? Why not
have a single cmpxchg() loop that does just the exact atomic op you
want?
The main reason for using xchg instead of cmpxchg is its performance impact
when the lock is heavily contended. Under those circumstances, a task may
need to do several tries of read+atomic-RMV before getting it right. This
may cause a lot of cacheline contention. With xchg, we need at most 2 atomic
ops. Using cmpxchg() does simplify the code a bit at the expense of
performance with heavy contention.
Have you actually measured this?

I haven't actually measured that myself. It is mostly from my experience. I could do some timing experiment with the cmpxchg() change and report back to you later.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/