Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomicoperations

From: Waiman Long
Date: Mon Apr 08 2013 - 13:54:07 EST

On 04/08/2013 10:38 AM, Linus Torvalds wrote:
On Mon, Apr 8, 2013 at 5:42 AM, Ingo Molnar<mingo@xxxxxxxxxx> wrote:
AFAICS the main performance trade-off is the following: when the owner CPU unlocks
the mutex, we'll poll it via a read first, which turns the cacheline into
shared-read MESI state. Then we notice that its content signals 'lock is
available', and we attempt the trylock again.

This increases lock latency in the few-contended-tasks case slightly - and we'd
like to know by precisely how much, not just for a generic '10-100 users' case
which does not tell much about the contention level.
We had this problem for *some* lock where we used a "read + cmpxchg"
in the hotpath and it caused us problems due to two cacheline state
transitions (first to shared, then to exclusive). It was faster to
just assume it was unlocked and try to do an immediate cmpxchg.

But iirc it is a non-issue for this case, because this is only about
the contended slow path.

I forget where we saw the case where we should *not* read the initial
value, though. Anybody remember?

That said, the MUTEX_SHOULD_XCHG_COUNT macro should die. Why shouldn't
all architectures just consider negative counts to be locked? It
doesn't matter that some might only ever see -1.

I think so too. However, I don't have the machines to test out other architectures. The MUTEX_SHOULD_XCHG_COUNT is just a safety measure to make sure that my code won't screw up the kernel in other architectures. Once it is confirmed that a negative count other than -1 is fine for all the other architectures, the macro can certainly go.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at