Re: [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions
From: Ilya Leoshkevich
Date: Fri Feb 07 2025 - 06:30:02 EST
On Fri, 2025-02-07 at 02:05 +0000, Peilin Ye wrote:
> Introduce BPF instructions with load-acquire and store-release
> semantics, as discussed in [1]. The following new flags are defined:
>
> BPF_ATOMIC_LOAD 0x10
> BPF_ATOMIC_STORE 0x20
> BPF_ATOMIC_TYPE(imm) ((imm) & 0xf0)
>
> BPF_RELAXED 0x0
> BPF_ACQUIRE 0x1
> BPF_RELEASE 0x2
> BPF_ACQ_REL 0x3
> BPF_SEQ_CST 0x4
>
> BPF_LOAD_ACQ (BPF_ATOMIC_LOAD | BPF_ACQUIRE)
> BPF_STORE_REL (BPF_ATOMIC_STORE | BPF_RELEASE)
>
> A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
> field set to BPF_LOAD_ACQ (0x11).
>
> Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction
> with
> the 'imm' field set to BPF_STORE_REL (0x22).
>
> Unlike existing atomic operations that only support BPF_W (32-bit)
> and
> BPF_DW (64-bit) size modifiers, load-acquires and store-releases also
> support BPF_B (8-bit) and BPF_H (16-bit). An 8- or 16-bit load-
> acquire
> zero-extends the value before writing it to a 32-bit register, just
> like
> ARM64 instruction LDARH and friends.
>
> As an example, consider the following 64-bit load-acquire BPF
> instruction:
>
> db 10 00 00 11 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0))
>
> opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
> imm (0x00000011): BPF_LOAD_ACQ
>
> Similarly, a 16-bit BPF store-release:
>
> cb 21 00 00 22 00 00 00 store_release((u16 *)(r1 + 0x0), w2)
>
> opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
> imm (0x00000022): BPF_STORE_REL
>
> In arch/{arm64,s390,x86}/net/bpf_jit_comp.c, have
> bpf_jit_supports_insn(..., /*in_arena=*/true) return false for the
> new
> instructions, until the corresponding JIT compiler supports them.
>
> [1]
> https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@xxxxxxxxxx/
>
> Acked-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
> Signed-off-by: Peilin Ye <yepeilin@xxxxxxxxxx>
> ---
> arch/arm64/net/bpf_jit_comp.c | 4 +++
> arch/s390/net/bpf_jit_comp.c | 14 +++++---
> arch/x86/net/bpf_jit_comp.c | 4 +++
> include/linux/bpf.h | 11 ++++++
> include/linux/filter.h | 2 ++
> include/uapi/linux/bpf.h | 13 +++++++
> kernel/bpf/core.c | 63 ++++++++++++++++++++++++++++++--
> --
> kernel/bpf/disasm.c | 12 +++++++
> kernel/bpf/verifier.c | 45 ++++++++++++++++++++++--
> tools/include/uapi/linux/bpf.h | 13 +++++++
> 10 files changed, 168 insertions(+), 13 deletions(-)
Acked-by: Ilya Leoshkevich <iii@xxxxxxxxxxxxx>
s390x has a strong memory model, and the regular load and store
instructions are atomic as long as operand addresses are aligned.
IIUC the verifier already enforces this unless BPF_F_ANY_ALIGNMENT
is set, in which case whoever loaded the program is responsible for the
consequences: memory accesses that happen to be unaligned would
not trigger an exception, but they would not be atomic either.
So I can implement the new instructions as normal loads/stores after
this series is merged.