Re: [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions

From: Ilya Leoshkevich
Date: Fri Feb 07 2025 - 06:30:02 EST


On Fri, 2025-02-07 at 02:05 +0000, Peilin Ye wrote:
> Introduce BPF instructions with load-acquire and store-release
> semantics, as discussed in [1].  The following new flags are defined:
>
>   BPF_ATOMIC_LOAD         0x10
>   BPF_ATOMIC_STORE        0x20
>   BPF_ATOMIC_TYPE(imm)    ((imm) & 0xf0)
>
>   BPF_RELAXED        0x0
>   BPF_ACQUIRE        0x1
>   BPF_RELEASE        0x2
>   BPF_ACQ_REL        0x3
>   BPF_SEQ_CST        0x4
>
>   BPF_LOAD_ACQ       (BPF_ATOMIC_LOAD | BPF_ACQUIRE)
>   BPF_STORE_REL      (BPF_ATOMIC_STORE | BPF_RELEASE)
>
> A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
> field set to BPF_LOAD_ACQ (0x11).
>
> Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction
> with
> the 'imm' field set to BPF_STORE_REL (0x22).
>
> Unlike existing atomic operations that only support BPF_W (32-bit)
> and
> BPF_DW (64-bit) size modifiers, load-acquires and store-releases also
> support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or 16-bit load-
> acquire
> zero-extends the value before writing it to a 32-bit register, just
> like
> ARM64 instruction LDARH and friends.
>
> As an example, consider the following 64-bit load-acquire BPF
> instruction:
>
>   db 10 00 00 11 00 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
>
>   opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
>   imm (0x00000011): BPF_LOAD_ACQ
>
> Similarly, a 16-bit BPF store-release:
>
>   cb 21 00 00 22 00 00 00  store_release((u16 *)(r1 + 0x0), w2)
>
>   opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
>   imm (0x00000022): BPF_STORE_REL
>
> In arch/{arm64,s390,x86}/net/bpf_jit_comp.c, have
> bpf_jit_supports_insn(..., /*in_arena=*/true) return false for the
> new
> instructions, until the corresponding JIT compiler supports them.
>
> [1]
> https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@xxxxxxxxxx/
>
> Acked-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
> Signed-off-by: Peilin Ye <yepeilin@xxxxxxxxxx>
> ---
>  arch/arm64/net/bpf_jit_comp.c  |  4 +++
>  arch/s390/net/bpf_jit_comp.c   | 14 +++++---
>  arch/x86/net/bpf_jit_comp.c    |  4 +++
>  include/linux/bpf.h            | 11 ++++++
>  include/linux/filter.h         |  2 ++
>  include/uapi/linux/bpf.h       | 13 +++++++
>  kernel/bpf/core.c              | 63 ++++++++++++++++++++++++++++++--
> --
>  kernel/bpf/disasm.c            | 12 +++++++
>  kernel/bpf/verifier.c          | 45 ++++++++++++++++++++++--
>  tools/include/uapi/linux/bpf.h | 13 +++++++
>  10 files changed, 168 insertions(+), 13 deletions(-)

Acked-by: Ilya Leoshkevich <iii@xxxxxxxxxxxxx>

s390x has a strong memory model, and the regular load and store
instructions are atomic as long as operand addresses are aligned.

IIUC the verifier already enforces this unless BPF_F_ANY_ALIGNMENT
is set, in which case whoever loaded the program is responsible for the
consequences: memory accesses that happen to be unaligned would
not trigger an exception, but they would not be atomic either.

So I can implement the new instructions as normal loads/stores after
this series is merged.