Re: [PATCH bpf-next] arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs

From: Andrii Nakryiko
Date: Fri Apr 05 2024 - 14:11:06 EST


On Fri, Apr 5, 2024 at 8:48 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Fri, Apr 5, 2024 at 2:17 AM Puranjay Mohan <puranjay12@xxxxxxxxx> wrote:
> >
> > Support an instruction for resolving absolute addresses of per-CPU
> > data from their per-CPU offsets. This instruction is internal-only and
> > users are not allowed to use them directly. They will only be used for
> > internal inlining optimizations for now between BPF verifier and BPF
> > JITs.
> >
> > Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu
> > access using tpidr_el1"), the per-cpu offset for the CPU is stored in
> > the tpidr_el1/2 register of that CPU.
> >
> > To support this BPF instruction in the ARM64 JIT, the following ARM64
> > instructions are emitted:
> >
> > mov dst, src // Move src to dst, if src != dst
> > mrs tmp, tpidr_el1/2 // Move per-cpu offset of the current cpu in tmp.
> > add dst, dst, tmp // Add the per cpu offset to the dst.
> >
> > If CONFIG_SMP is not defined, then nothing is emitted if src == dst, and
> > mov dst, src is emitted if dst != src.
> >
> > To measure the performance improvement provided by this change, the
> > benchmark in [1] was used:
> >
> > Before:
> > glob-arr-inc : 23.597 ± 0.012M/s
> > arr-inc : 23.173 ± 0.019M/s
> > hash-inc : 12.186 ± 0.028M/s
> >
> > After:
> > glob-arr-inc : 23.819 ± 0.034M/s
> > arr-inc : 23.285 ± 0.017M/s
> > hash-inc : 12.419 ± 0.011M/s
> >
> > [1] https://github.com/anakryiko/linux/commit/8dec900975ef
>
> You don't see as big of a gain, because bpf_get_smp_processor_id()
> is not inlined yet on arm64.
>

yep, would be nice to add ARM64 and RISC-V support there as well.
Though it feels that supporting this in BPF JIT directly might be
actually easier for RISC-V/ARM64, not sure?

> But even without it I expected bigger gains.
> Could you do 'perf report' before/after ?
> Just want to see what's on top.

I also did `bpftool p d x id <progid>` and `bpftool p d j id <progid>`
to validate expected inlined BPF instructions and jitted code. So it
might be a good idea to do that as well.

Either way, thanks for working on this!

>
> >
> > Signed-off-by: Puranjay Mohan <puranjay12@xxxxxxxxx>
> > ---
> > arch/arm64/include/asm/insn.h | 7 +++++++
> > arch/arm64/lib/insn.c | 11 +++++++++++
> > arch/arm64/net/bpf_jit.h | 6 ++++++
> > arch/arm64/net/bpf_jit_comp.c | 16 ++++++++++++++++
> > 4 files changed, 40 insertions(+)
> >

[...]