[PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg

From: Uros Bizjak
Date: Wed Apr 05 2023 - 10:18:17 EST


Add generic and target specific support for local{,64}_try_cmpxchg
and wire up support for all targets that use local_t infrastructure.

The patch enables x86 targets to emit special instruction for
local_try_cmpxchg and also local64_try_cmpxchg for x86_64.

The last patch changes __perf_output_begin in events/ring_buffer
to use new locking primitive and improves code from

4b3: 48 8b 82 e8 00 00 00 mov 0xe8(%rdx),%rax
4ba: 48 8b b8 08 04 00 00 mov 0x408(%rax),%rdi
4c1: 8b 42 1c mov 0x1c(%rdx),%eax
4c4: 48 8b 4a 28 mov 0x28(%rdx),%rcx
4c8: 85 c0 test %eax,%eax
...
4ef: 48 89 c8 mov %rcx,%rax
4f2: 48 0f b1 7a 28 cmpxchg %rdi,0x28(%rdx)
4f7: 48 39 c1 cmp %rax,%rcx
4fa: 75 b7 jne 4b3 <...>

to

4b2: 48 8b 4a 28 mov 0x28(%rdx),%rcx
4b6: 48 8b 82 e8 00 00 00 mov 0xe8(%rdx),%rax
4bd: 48 8b b0 08 04 00 00 mov 0x408(%rax),%rsi
4c4: 8b 42 1c mov 0x1c(%rdx),%eax
4c7: 85 c0 test %eax,%eax
...
4d4: 48 89 c8 mov %rcx,%rax
4d7: 48 0f b1 72 28 cmpxchg %rsi,0x28(%rdx)
4dc: 0f 85 d0 00 00 00 jne 5b2 <...>
...
5b2: 48 89 c1 mov %rax,%rcx
5b5: e9 fc fe ff ff jmp 4b6 <...>

Please note that in addition to removed compare, the load from
0x28(%rdx) gets moved out of the loop and the code is rearranged
according to likely/unlikely tags in the source.
---
v2:

Implement target specific support for local_try_cmpxchg and
local_cmpxchg using typed C wrappers that call their _local
counterpart and provide additional checking of their input
arguments.

Cc: Richard Henderson <richard.henderson@xxxxxxxxxx>
Cc: Ivan Kokshaysky <ink@xxxxxxxxxxxxxxxxxxxx>
Cc: Matt Turner <mattst88@xxxxxxxxx>
Cc: Huacai Chen <chenhuacai@xxxxxxxxxx>
Cc: WANG Xuerui <kernel@xxxxxxxxxx>
Cc: Thomas Bogendoerfer <tsbogend@xxxxxxxxxxxxxxxx>
Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
Cc: Nicholas Piggin <npiggin@xxxxxxxxx>
Cc: Christophe Leroy <christophe.leroy@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: Arnd Bergmann <arnd@xxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Cc: Mark Rutland <mark.rutland@xxxxxxx>
Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
Cc: Ian Rogers <irogers@xxxxxxxxxx>
Cc: Will Deacon <will@xxxxxxxxxx>
Cc: Boqun Feng <boqun.feng@xxxxxxxxx>
Cc: Jiaxun Yang <jiaxun.yang@xxxxxxxxxxx>
Cc: Jun Yi <yijun@xxxxxxxxxxx>

Uros Bizjak (5):
locking/atomic: Add generic try_cmpxchg{,64}_local support
locking/generic: Wire up local{,64}_try_cmpxchg
locking/arch: Wire up local_try_cmpxchg
locking/x86: Define arch_try_cmpxchg_local
events: Illustrate the transition to local{,64}_try_cmpxchg

arch/alpha/include/asm/local.h | 12 +++++++++--
arch/loongarch/include/asm/local.h | 13 +++++++++--
arch/mips/include/asm/local.h | 13 +++++++++--
arch/powerpc/include/asm/local.h | 11 ++++++++++
arch/x86/events/core.c | 9 ++++----
arch/x86/include/asm/cmpxchg.h | 6 ++++++
arch/x86/include/asm/local.h | 13 +++++++++--
include/asm-generic/local.h | 1 +
include/asm-generic/local64.h | 12 ++++++++++-
include/linux/atomic/atomic-arch-fallback.h | 24 ++++++++++++++++++++-
include/linux/atomic/atomic-instrumented.h | 20 ++++++++++++++++-
kernel/events/ring_buffer.c | 5 +++--
scripts/atomic/gen-atomic-fallback.sh | 4 ++++
scripts/atomic/gen-atomic-instrumented.sh | 2 +-
14 files changed, 126 insertions(+), 19 deletions(-)

--
2.39.2