Re: [cpuops cmpxchg V2 5/5] cpuops: Use cmpxchg for xchg to avoidlock semantics

From: Eric Dumazet
Date: Tue Dec 14 2010 - 11:44:45 EST


Le mardi 14 dÃcembre 2010 Ã 10:28 -0600, Christoph Lameter a Ãcrit :
> piÃce jointe document texte brut (cpuops_xchg_with_cmpxchg)
> Use cmpxchg instead of xchg to realize this_cpu_xchg.
>
> xchg will cause LOCK overhead since LOCK is always implied but cmpxchg
> will not.
>
> Baselines:
>
> xchg() = 18 cycles (no segment prefix, LOCK semantics)
> __this_cpu_xchg = 1 cycle
>
> (simulated using this_cpu_read/write, two prefixes. Looks like the
> cpu can use loop optimization to get rid of most of the overhead)
>
> Cycles before:
>
> this_cpu_xchg = 37 cycles (segment prefix and LOCK (implied by xchg))
>
> After:
>
> this_cpu_xchg = 11 cycle (using cmpxchg without lock semantics)
>
> Signed-off-by: Christoph Lameter <cl@xxxxxxxxx>
>
> ---
> arch/x86/include/asm/percpu.h | 21 +++++++++++++++------
> 1 file changed, 15 insertions(+), 6 deletions(-)
>
> Index: linux-2.6/arch/x86/include/asm/percpu.h
> ===================================================================
> --- linux-2.6.orig/arch/x86/include/asm/percpu.h 2010-12-10 12:46:31.000000000 -0600
> +++ linux-2.6/arch/x86/include/asm/percpu.h 2010-12-10 13:25:21.000000000 -0600
> @@ -213,8 +213,9 @@ do { \
> })
>
> /*
> - * Beware: xchg on x86 has an implied lock prefix. There will be the cost of
> - * full lock semantics even though they are not needed.
> + * xchg is implemented using cmpxchg without a lock prefix. xchg is
> + * expensive due to the implied lock prefix. The processor cannot prefetch
> + * cachelines if xchg is used.
> */
> #define percpu_xchg_op(var, nval) \
> ({ \
> @@ -222,25 +223,33 @@ do { \
> typeof(var) __new = (nval); \
> switch (sizeof(var)) { \
> case 1: \
> - asm("xchgb %2, "__percpu_arg(1) \
> + asm("\n1:mov "__percpu_arg(1)",%%al" \
> + "\n\tcmpxchgb %2, "__percpu_arg(1) \
> + "\n\tjnz 1b" \


You should use the fact that the failed cmpxchg loads in al/ax/eax/rax
the current value, so :

"\n\tmov "__percpu_arg(1)",%%al"
"\n1:\tcmpxchgb %2, "__percpu_arg(1)
"\n\tjnz 1b"

(No need to reload the value again)


> : "=a" (__ret), "+m" (var) \
> : "q" (__new) \
> : "memory"); \
> break; \
> case 2: \
> - asm("xchgw %2, "__percpu_arg(1) \
> + asm("\n1:mov "__percpu_arg(1)",%%ax" \
> + "\n\tcmpxchgw %2, "__percpu_arg(1) \
> + "\n\tjnz 1b" \
> : "=a" (__ret), "+m" (var) \
> : "r" (__new) \
> : "memory"); \
> break; \
> case 4: \
> - asm("xchgl %2, "__percpu_arg(1) \
> + asm("\n1:mov "__percpu_arg(1)",%%eax" \
> + "\n\tcmpxchgl %2, "__percpu_arg(1) \
> + "\n\tjnz 1b" \
> : "=a" (__ret), "+m" (var) \
> : "r" (__new) \
> : "memory"); \
> break; \
> case 8: \
> - asm("xchgq %2, "__percpu_arg(1) \
> + asm("\n1:mov "__percpu_arg(1)",%%rax" \
> + "\n\tcmpxchgq %2, "__percpu_arg(1) \
> + "\n\tjnz 1b" \
> : "=a" (__ret), "+m" (var) \
> : "r" (__new) \
> : "memory"); \
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/