Re: [RFC PATCH v3 15/37] kvx: Add atomic/locking headers

From: Arnd Bergmann
Date: Tue Jul 23 2024 - 04:26:38 EST


On Mon, Jul 22, 2024, at 11:41, ysionneau@xxxxxxxxxxxxx wrote:

> +#define ATOMIC64_RETURN_OP(op, c_op) \
> +static inline long arch_atomic64_##op##_return(long i, atomic64_t *v) \
> +{ \
> + long new, old, ret; \
> + \
> + do { \
> + old = arch_atomic64_read(v); \
> + new = old c_op i; \
> + ret = arch_cmpxchg(&v->counter, old, new); \
> + } while (ret != old); \
> + \
> + return new; \
> +}
> +
> +#define ATOMIC64_OP(op, c_op) \
> +static inline void arch_atomic64_##op(long i, atomic64_t *v) \
> +{ \
> + long new, old, ret; \
> + \
> + do { \
> + old = arch_atomic64_read(v); \
> + new = old c_op i; \
> + ret = arch_cmpxchg(&v->counter, old, new); \
> + } while (ret != old); \
> +}

These don't look like they are ideal because you have a loop
around arch_cmpxchg(), which is built up from a loop itself.

You may want to change these to be expressed in terms of the
compiler intrinsics directly.

> +#ifndef _ASM_KVX_BARRIER_H
> +#define _ASM_KVX_BARRIER_H
> +
> +/* fence is sufficient to guarantee write ordering */
> +#define mb() __builtin_kvx_fence()
> +
> +#include <asm-generic/barrier.h>

mb() is a fairly strong barrier itself and gets used
as a fallback for all weaker barriers (read-only,
write-only, dma-only, smp-only). Have you checked
if any of them can be less than than
__builtin_kvx_fence(), e.g. a compiler-only barrier(),
like the SMP barriers on x86?

> +
> +#include <asm/cmpxchg.h>
> +
> +static inline int fls(int x)
> +{
> + return 32 - __builtin_kvx_clzw(x);
> +}
> +
> +static inline int fls64(__u64 x)
> +{
> + return 64 - __builtin_kvx_clzd(x);
> +}

The generic fallback for these uses __builtin_clz().

If that produces the same output as the kvx specific
intrintrinsics, you can just remove the above and
use the generic versions.

> +static __always_inline unsigned long __cmpxchg(unsigned long old,
> + unsigned long new,
> + volatile void *ptr, int size)
> +{
> + switch (size) {
> + case 4:
> + return __cmpxchg_u32(old, new, ptr);
> + case 8:
> + return __cmpxchg_u64(old, new, ptr);
> + default:
> + return __cmpxchg_called_with_bad_pointer();
> + }
> +}

With linux-6.11 you now also need to provide a single-byte
cmpxchg(). You can use cmpxchg_emu_u8() or provide a more
efficient custom one based on the 32/64-bit versions instead.

Arnd