Re: [PATCH v2 bpf-next 00/13] Atomics for eBPF

From: Andrii Nakryiko
Date: Tue Dec 01 2020 - 21:01:30 EST


On Mon, Nov 30, 2020 at 7:51 PM Yonghong Song <yhs@xxxxxx> wrote:
>
>
>
> On 11/30/20 9:22 AM, Yonghong Song wrote:
> >
> >
> > On 11/28/20 5:40 PM, Alexei Starovoitov wrote:
> >> On Fri, Nov 27, 2020 at 09:53:05PM -0800, Yonghong Song wrote:
> >>>
> >>>
> >>> On 11/27/20 9:57 AM, Brendan Jackman wrote:
> >>>> Status of the patches
> >>>> =====================
> >>>>
> >>>> Thanks for the reviews! Differences from v1->v2 [1]:
> >>>>
> >>>> * Fixed mistakes in the netronome driver
> >>>>
> >>>> * Addd sub, add, or, xor operations
> >>>>
> >>>> * The above led to some refactors to keep things readable. (Maybe I
> >>>> should have just waited until I'd implemented these before starting
> >>>> the review...)
> >>>>
> >>>> * Replaced BPF_[CMP]SET | BPF_FETCH with just BPF_[CMP]XCHG, which
> >>>> include the BPF_FETCH flag
> >>>>
> >>>> * Added a bit of documentation. Suggestions welcome for more places
> >>>> to dump this info...
> >>>>
> >>>> The prog_test that's added depends on Clang/LLVM features added by
> >>>> Yonghong in
> >>>> https://reviews.llvm.org/D72184
> >>>>
> >>>> This only includes a JIT implementation for x86_64 - I don't plan to
> >>>> implement JIT support myself for other architectures.
> >>>>
> >>>> Operations
> >>>> ==========
> >>>>
> >>>> This patchset adds atomic operations to the eBPF instruction set. The
> >>>> use-case that motivated this work was a trivial and efficient way to
> >>>> generate globally-unique cookies in BPF progs, but I think it's
> >>>> obvious that these features are pretty widely applicable. The
> >>>> instructions that are added here can be summarised with this list of
> >>>> kernel operations:
> >>>>
> >>>> * atomic[64]_[fetch_]add
> >>>> * atomic[64]_[fetch_]sub
> >>>> * atomic[64]_[fetch_]and
> >>>> * atomic[64]_[fetch_]or
> >>>
> >>> * atomic[64]_[fetch_]xor
> >>>
> >>>> * atomic[64]_xchg
> >>>> * atomic[64]_cmpxchg
> >>>
> >>> Thanks. Overall looks good to me but I did not check carefully
> >>> on jit part as I am not an expert in x64 assembly...
> >>>
> >>> This patch also introduced atomic[64]_{sub,and,or,xor}, similar to
> >>> xadd. I am not sure whether it is necessary. For one thing,
> >>> users can just use atomic[64]_fetch_{sub,and,or,xor} to ignore
> >>> return value and they will achieve the same result, right?
> >>> From llvm side, there is no ready-to-use gcc builtin matching
> >>> atomic[64]_{sub,and,or,xor} which does not have return values.
> >>> If we go this route, we will need to invent additional bpf
> >>> specific builtins.
> >>
> >> I think bpf specific builtins are overkill.
> >> As you said the users can use atomic_fetch_xor() and ignore
> >> return value. I think llvm backend should be smart enough to use
> >> BPF_ATOMIC | BPF_XOR insn without BPF_FETCH bit in such case.
> >> But if it's too cumbersome to do at the moment we skip this
> >> optimization for now.
> >
> > We can initially all have BPF_FETCH bit as at that point we do not
> > have def-use chain. Later on we can add a
> > machine ssa IR phase and check whether the result of, say
> > atomic_fetch_or(), is used or not. If not, we can change the
> > instruction to atomic_or.
>
> Just implemented what we discussed above in llvm:
> https://reviews.llvm.org/D72184
> main change:
> 1. atomic_fetch_sub (and later atomic_sub) is gone. llvm will
> transparently transforms it to negation followed by
> atomic_fetch_add or atomic_add (xadd). Kernel can remove
> atomic_fetch_sub/atomic_sub insns.
> 2. added new instructions for atomic_{and, or, xor}.
> 3. for gcc builtin e.g., __sync_fetch_and_or(), if return
> value is used, atomic_fetch_or will be generated. Otherwise,
> atomic_or will be generated.

Great, this means that all existing valid uses of
__sync_fetch_and_add() will generate BPF_XADD instructions and will
work on old kernels, right?

If that's the case, do we still need cpu=v4? The new instructions are
*only* going to be generated if the user uses previously unsupported
__sync_fetch_xxx() intrinsics. So, in effect, the user consciously
opts into using new BPF instructions. cpu=v4 seems like an unnecessary
tautology then?