On Fri, Jan 06, 2023 at 03:11:58PM +0100, Jules Maselbas wrote:Done for V2 (which we will send pretty soon).
Hi Mark,
On Wed, Jan 04, 2023 at 09:53:24AM +0000, Mark Rutland wrote:
On Tue, Jan 03, 2023 at 05:43:39PM +0100, Yann Sionneau wrote:Good point, we are going to use {READ,WRITE}_ONCE macros
Add common headers (atomic, bitops, barrier and locking) for basicThis is a plain read, and is *not* atomic.
kvx support.
CC: Will Deacon <will@xxxxxxxxxx>
CC: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
CC: Boqun Feng <boqun.feng@xxxxxxxxx>
CC: Mark Rutland <mark.rutland@xxxxxxx>
CC: linux-kernel@xxxxxxxxxxxxxxx
Co-developed-by: Clement Leger <clement.leger@xxxxxxxxxxx>
Signed-off-by: Clement Leger <clement.leger@xxxxxxxxxxx>
Co-developed-by: Jules Maselbas <jmaselbas@xxxxxxxxx>
Signed-off-by: Jules Maselbas <jmaselbas@xxxxxxxxx>
Co-developed-by: Julian Vetter <jvetter@xxxxxxxxx>
Signed-off-by: Julian Vetter <jvetter@xxxxxxxxx>
Co-developed-by: Julien Villette <jvillette@xxxxxxxxx>
Signed-off-by: Julien Villette <jvillette@xxxxxxxxx>
Co-developed-by: Yann Sionneau <ysionneau@xxxxxxxxx>
Signed-off-by: Yann Sionneau <ysionneau@xxxxxxxxx>
---
arch/kvx/include/asm/atomic.h | 104 +++++++++++++++++
arch/kvx/include/asm/barrier.h | 15 +++
arch/kvx/include/asm/bitops.h | 207 +++++++++++++++++++++++++++++++++
arch/kvx/include/asm/bitrev.h | 32 +++++
arch/kvx/include/asm/cmpxchg.h | 185 +++++++++++++++++++++++++++++
5 files changed, 543 insertions(+)
create mode 100644 arch/kvx/include/asm/atomic.h
create mode 100644 arch/kvx/include/asm/barrier.h
create mode 100644 arch/kvx/include/asm/bitops.h
create mode 100644 arch/kvx/include/asm/bitrev.h
create mode 100644 arch/kvx/include/asm/cmpxchg.h
diff --git a/arch/kvx/include/asm/atomic.h b/arch/kvx/include/asm/atomic.h
new file mode 100644
index 000000000000..eb8acbcbc70d
--- /dev/null
+++ b/arch/kvx/include/asm/atomic.h
@@ -0,0 +1,104 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2017-2023 Kalray Inc.
+ * Author(s): Clement Leger
+ */
+
+#ifndef _ASM_KVX_ATOMIC_H
+#define _ASM_KVX_ATOMIC_H
+
+#include <linux/types.h>
+
+#include <asm/cmpxchg.h>
+
+#define ATOMIC64_INIT(i) { (i) }
+
+#define arch_atomic64_cmpxchg(v, old, new) (arch_cmpxchg(&((v)->counter), old, new))
+#define arch_atomic64_xchg(v, new) (arch_xchg(&((v)->counter), new))
+
+static inline long arch_atomic64_read(const atomic64_t *v)
+{
+ return v->counter;
+}
The compiler can replay a plain read an arbitrary number of times, and is
permitted to split it into smaller accesses.
At minimum this needs to be
READ_ONCE(v->counter)
... which will prevent replay. Whether or not that's actually atomic will
depend on the instructions the compiler generates, and how those instructions
are defines in your architecture.
Ack for doing it later.Perfect, thanks for confirming!
Do you have a single instruction that can read a 64-bit memory location, and isWe do have a single instruction that can read a 64-bit memory location
it guaranteed to result in a single access that cannot be split?
(supported sizes are 8, 16, 32, 64, 128, 256).
All accesses are guaranteed to not be split, unless they are misaligned.
Furthermore, misaligned write accesses crossing a 32-byte boundary may
complete in a non-atomic way.
[...]
Hmm... from a quick attempt just now it looks like that will be a bit moreI am not sure what this implies and how big this change might be,+static inline void arch_atomic64_set(atomic64_t *v, long i)Same comments as for arch_atomic64_read(); at minimum this needs to be:
+{
+ v->counter = i;
+}
WRITE_ONCE(v->counter, i)
... but that may or may not actually be atomic on your architecture.
+#define ATOMIC64_RETURN_OP(op, c_op) \These look ok, but it'd be nicer if we could teach the generic atomic64 code to
+static inline long arch_atomic64_##op##_return(long i, atomic64_t *v) \
+{ \
+ long new, old, ret; \
+ \
+ do { \
+ old = v->counter; \
+ new = old c_op i; \
+ ret = arch_cmpxchg(&v->counter, old, new); \
+ } while (ret != old); \
+ \
+ return new; \
+}
+
+#define ATOMIC64_OP(op, c_op) \
+static inline void arch_atomic64_##op(long i, atomic64_t *v) \
+{ \
+ long new, old, ret; \
+ \
+ do { \
+ old = v->counter; \
+ new = old c_op i; \
+ ret = arch_cmpxchg(&v->counter, old, new); \
+ } while (ret != old); \
+}
+
+#define ATOMIC64_FETCH_OP(op, c_op) \
+static inline long arch_atomic64_fetch_##op(long i, atomic64_t *v) \
+{ \
+ long new, old, ret; \
+ \
+ do { \
+ old = v->counter; \
+ new = old c_op i; \
+ ret = arch_cmpxchg(&v->counter, old, new); \
+ } while (ret != old); \
+ \
+ return old; \
+}
do this, like the generic atomic code does.
We could rename the existing asm-generic/atomic64 code to atomic64-spinlock,
and add a separate atomic64-cmpxchg (and likewise for the 32-bit code) to make
that clearer and consistent.
but I'll take a look at this.
churny than I thought.
We can always factor this out later, so feel free to leave it as-is, thgouh if
we could make this genric (and have it look like asm-generic/atomic.h), it'd be
nice for consistency and maintenance.
The 32-bit atomics should come from asm-generic/atomic.h and not be necesary,+Likewise for these two.
+#define ATOMIC64_OPS(op, c_op) \
+ ATOMIC64_OP(op, c_op) \
+ ATOMIC64_RETURN_OP(op, c_op) \
+ ATOMIC64_FETCH_OP(op, c_op)
+
+ATOMIC64_OPS(and, &)
+ATOMIC64_OPS(or, |)
+ATOMIC64_OPS(xor, ^)
+ATOMIC64_OPS(add, +)
+ATOMIC64_OPS(sub, -)
+
+#undef ATOMIC64_OPS
+#undef ATOMIC64_FETCH_OP
+#undef ATOMIC64_OP
+
+static inline int arch_atomic_add_return(int i, atomic_t *v)
+{
+ int new, old, ret;
+
+ do {
+ old = v->counter;
+ new = old + i;
+ ret = arch_cmpxchg(&v->counter, old, new);
+ } while (ret != old);
+
+ return new;
+}
+
+static inline int arch_atomic_sub_return(int i, atomic_t *v)
+{
+ return arch_atomic_add_return(-i, v);
+}
that has both arch_atomic_add_return() and arch_atomic_sub_return().
Ok; and you don't have a way to ask GCC for an arbitrary register pair, so your62 and r63 are hardcoded in the inline assembly, they are caller saved.diff --git a/arch/kvx/include/asm/cmpxchg.h b/arch/kvx/include/asm/cmpxchg.hWhy do you need to specify the exact registers?
new file mode 100644
index 000000000000..b1d128b060a2
--- /dev/null
+++ b/arch/kvx/include/asm/cmpxchg.h
@@ -0,0 +1,185 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2017-2023 Kalray Inc.
+ * Author(s): Clement Leger
+ * Yann Sionneau
+ */
+
+#ifndef _ASM_KVX_CMPXCHG_H
+#define _ASM_KVX_CMPXCHG_H
+
+#include <linux/bits.h>
+#include <linux/types.h>
+#include <linux/build_bug.h>
+
+/*
+ * On kvx, we have a boolean compare and swap which means that the operation
+ * returns only the success of operation.
+ * If operation succeed, this is simple, we just need to return the provided
+ * old value. However, if it fails, we need to load the value to return it for
+ * the caller. If the loaded value is different from the "old" provided by the
+ * caller, we can return it since it will means it failed.
+ * However, if for some reason the value we read is equal to the old value
+ * provided by the caller, we can't simply return it or the caller will think it
+ * succeeded. So if the value we read is the same as the "old" provided by
+ * the caller, we try again until either we succeed or we fail with a different
+ * value than the provided one.
+ */
+#define __cmpxchg(ptr, old, new, op_suffix, load_suffix) \
+({ \
+ register unsigned long __rn asm("r62"); \
+ register unsigned long __ro asm("r63"); \
I have a C implementation that uses builtins however this is not merged
in our tree yet (but I want to).
e.g. does some instruction use these implicitly, or do you need two adjacentThe atomic compare and swap (acswap) instruction needs a register "pair"
register for encoding reasons?
which can only exists with "adjacent" registers: $r0r1, $r2r3 ect.
chose r62+r63 as they can be clobbered?
It might be worth looking into using an __int128_t to give you a register pair,
assuming your calling convention mandares adjacency of the two halves. That
could give you the pair while giving the compiler freedom to chose any suitable
pair (assuming you have a suitable operand modifier to extract the low/high
registers from the asm operand.
Ok.We have two barrier types:+ __asm__ __volatile__ ( \This implies you can implement the relaxed form of cmpxchg().
+ /* Fence to guarantee previous store to be committed */ \
+ "fence\n" \
What ordering do you get by default, and do you have any other barriers (e.g.
for acquire/release semantics), or just "fence" ?
- fence: ensure that all uncached memory operations are committed to
memory, typically used to ensure a write to memory is visible to
other cores.
- barrier: flush the core instruction pipeline and memory system
In the absence of barriers do you have any ordering guarantee? e.g. weak like
arm/power, TSO like x86, or something else entirely?
When you say "uncached memory operations", does that mean that you have cached
memory operatins which are not affected by the barriers, or is "uncached memory
operations" just meant to read as "oustanding/incomplete memory operations" ?