Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction

From: Daniel Borkmann
Date: Wed Nov 11 2015 - 07:21:19 EST


On 11/11/2015 12:58 PM, Will Deacon wrote:
On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote:
On 11/11/2015 11:24 AM, Will Deacon wrote:
On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
On 11/10/2015 4:08 PM, Eric Dumazet wrote:
On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
aarch64 doesn't have native support for XADD instruction, implement it by
the below instruction sequence:

aarch64 supports atomic add in ARMv8.1.
For ARMv8(.0), please consider using LDXR/STXR sequence.

Is it worth optimizing for the 8.1 case? It would add a bit of complexity
to make the code depend on the CPU feature, but it's certainly doable.

What's the atomicity required for? Put another way, what are we racing
with (I thought bpf was single-threaded)? Do we need to worry about
memory barriers?

Apologies if these are stupid questions, but all I could find was
samples/bpf/sock_example.c and it didn't help much :(

The equivalent code more readable in restricted C syntax (that can be
compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
built-in __sync_fetch_and_add() will be translated into a BPF_XADD
insn variant.

Yikes, so the memory-model for BPF is based around the deprecated GCC
__sync builtins, that inherit their semantics from ia64? Any reason not
to use the C11-compatible __atomic builtins[1] as a base?

Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
from sock_example.c can be regarded as one possible equivalent program
section output from the compiler.

What you can race against is that an eBPF map can be _shared_ by
multiple eBPF programs that are attached somewhere in the system, and
they could all update a particular entry/counter from the map at the
same time.

Ok, so it does sound like eBPF needs to define/choose a memory-model and
I worry that riding on the back of __sync isn't necessarily the right
thing to do, particularly as its fallen out of favour with the compiler
folks. On weakly-ordered architectures, it's also going to result in
heavy-weight barriers for all atomic operations.

Will

[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/