[tip:locking/core] locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG

From: tip-bot for Sebastian Andrzej Siewior
Date: Wed May 13 2015 - 17:04:44 EST

Next message: Eric Dumazet: "Re: regression with today's linux-next"
Previous message: Thomas Gleixner: "Re: [v4 1/3] genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Commit-ID: cede88418b385b50f6841e4b2f1586888b8ab924
Gitweb: http://git.kernel.org/tip/cede88418b385b50f6841e4b2f1586888b8ab924
Author: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
AuthorDate: Wed, 25 Feb 2015 18:56:13 +0100
Committer: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
CommitDate: Wed, 13 May 2015 10:51:28 +0200

locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG

The rtmutex code is the only user of __HAVE_ARCH_CMPXCHG and we have a few
other user of cmpxchg() which do not care about __HAVE_ARCH_CMPXCHG. This
define was first introduced in 23f78d4a0 ("[PATCH] pi-futex: rt mutex core")
which is v2.6.18. The generic cmpxchg was introduced later in 068fbad288
("Add cmpxchg_local to asm-generic for per cpu atomic operations") which is
v2.6.25.
Back then something was required to get rtmutex working with the fast
path on architectures without cmpxchg and this seems to be the result.

It popped up recently on rt-users because ARM (v6+) does not define
__HAVE_ARCH_CMPXCHG (even that it implements it) which results in slower
locking performance in the fast path.
To put some numbers on it: preempt -RT, am335x, 10 loops of
100000 invocations of rt_spin_lock() + rt_spin_unlock() (time "total" is
the average of the 10 loops for the 100000 invocations, "loop" is
"total / 100000 * 1000"):

cmpxchg | slowpath used || cmpxchg used
| total | loop || total | loop
--------|-----------|-------||------------|-------
ARMv6 | 9129.4 us | 91 ns || 3311.9 us | 33 ns
generic | 9360.2 us | 94 ns || 10834.6 us | 108 ns
----------------------------||--------------------

Forcing it to generic cmpxchg() made things worse for the slowpath and
even worse in cmpxchg() path. It boils down to 14ns more per lock+unlock
in a cache hot loop so it might not be that much in real world.
The last test was a substitute for pre ARMv6 machine but then I was able
to perform the comparison on imx28 which is ARMv5 and therefore is
always is using the generic cmpxchg implementation. And the numbers:

| total | loop
-------- |----------- |--------
slowpath | 263937.2 us | 2639 ns
cmpxchg | 16934.2 us | 169 ns
--------------------------------

The numbers are larger since the machine is slower in general. However,
letting rtmutex use cmpxchg() instead the slowpath seem to improve things.

Since from the ARM (tested on am335x + imx28) point of view always
using cmpxchg() in rt_mutex_lock() + rt_mutex_unlock() makes sense I
would drop the define.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
Cc: Arnd Bergmann <arnd@xxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: will.deacon@xxxxxxx
Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
Link: http://lkml.kernel.org/r/20150225175613.GE6823@xxxxxxxxxxxxx
Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
---
include/asm-generic/cmpxchg.h | 3 ---
kernel/locking/rtmutex.c | 6 +++---
2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/cmpxchg.h b/include/asm-generic/cmpxchg.h
index 811fb1e..3766ab3 100644
--- a/include/asm-generic/cmpxchg.h
+++ b/include/asm-generic/cmpxchg.h
@@ -86,9 +86,6 @@ unsigned long __xchg(unsigned long x, volatile void *ptr, int size)

/*
* Atomic compare and exchange.
- *
- * Do not define __HAVE_ARCH_CMPXCHG because we want to use it to check whether
- * a cmpxchg primitive faster than repeated local irq save/restore exists.
*/
#include <asm-generic/cmpxchg-local.h>

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index b732793..27dff66 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -70,10 +70,10 @@ static void fixup_rt_mutex_waiters(struct rt_mutex *lock)
}

/*
- * We can speed up the acquire/release, if the architecture
- * supports cmpxchg and if there's no debugging state to be set up
+ * We can speed up the acquire/release, if there's no debugging state to be
+ * set up.
*/
-#if defined(__HAVE_ARCH_CMPXCHG) && !defined(CONFIG_DEBUG_RT_MUTEXES)
+#ifndef CONFIG_DEBUG_RT_MUTEXES
# define rt_mutex_cmpxchg(l,c,n) (cmpxchg(&l->owner, c, n) == c)
static inline void mark_rt_mutex_waiters(struct rt_mutex *lock)
{
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Eric Dumazet: "Re: regression with today's linux-next"
Previous message: Thomas Gleixner: "Re: [v4 1/3] genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]