Re: [PATCH 1/1] selftests/bpf: add cls_redirect classifier

From: Daniel Borkmann
Date: Fri May 01 2020 - 19:49:03 EST


On 4/27/20 11:45 AM, Lorenz Bauer wrote:
On Sun, 26 Apr 2020 at 18:33, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
[...]
+/* Linux packet pointers are either aligned to NET_IP_ALIGN (aka 2 bytes),
+ * or not aligned if the arch supports efficient unaligned access.
+ *
+ * Since the verifier ensures that eBPF packet accesses follow these rules,
+ * we can tell LLVM to emit code as if we always had a larger alignment.
+ * It will yell at us if we end up on a platform where this is not valid.
+ */
+typedef uint8_t *net_ptr __attribute__((align_value(8)));

Wow. I didn't know about this attribute.
I wonder whether it can help Daniel's memcpy hack.

Yes, I think so.

Just for some more context [0]. I think the problem is a bit more complex in
general. Generally, _any_ kind of pointer to some data (except for the stack)
is currently treated as byte-by-byte copy from __builtin_memcpy() and other
similarly available __builtin_*() helpers on BPF backend since the backend
cannot make any assumptions about the data's alignment and whether unaligned
access from the underlying arch is ok & efficient (the latter the verifier
does judge for us however). So it's definitely not just limited to xdp->data.
There is also the issue that while access to any non-stack data can be
unaligned, access to the stack however cannot. I've discussed a while back
with Yonghong about potential solutions. One would be to add a small patch
to the BPF backend to enable __builtin_*() helpers to allow for unaligned
access which could then be opt-ed in e.g. via -mattr from llc for the case
when we know that the compiled program only runs on archs with efficient
unaligned access anyway. However, this still potentially breaks with the BPF
stack for the case when objects are, for example, larger than size 8 but with
a natural alignment smaller than 8 where __builtin_memcpy() would then decide
to emit dw-typed load/stores. But for these cases could then be annotated via
__aligned(8) on stack. So this is basically what we do right now as a generic
workaround in Cilium [0], meaning, our own memcpy/memset with optimal number
of instructions and __aligned(8) where needed; most of the time this __aligned(8)
is not needed, so it's really just a few places, and we also have a cocci
scripts to catch these during development if needed. Anyway, real thing would
be to allow the BPF stack for unaligned access as well and then BPF backend
could nicely solve this in a native way w/o any workarounds, but that is tbd.

Thanks,
Daniel

[0] https://github.com/cilium/cilium/blob/master/bpf/include/bpf/builtins.h