Re: [PATCH v5] bpf, x86_32: add eBPF JIT compiler for ia32

From: Daniel Borkmann
Date: Wed May 02 2018 - 18:13:10 EST


Hi Wang,

On 04/29/2018 02:37 PM, Wang YanQing wrote:
> The JIT compiler emits ia32 bit instructions. Currently, It supports eBPF
> only. Classic BPF is supported because of the conversion by BPF core.
>
> Almost all instructions from eBPF ISA supported except the following:
> BPF_ALU64 | BPF_DIV | BPF_K
> BPF_ALU64 | BPF_DIV | BPF_X
> BPF_ALU64 | BPF_MOD | BPF_K
> BPF_ALU64 | BPF_MOD | BPF_X
> BPF_STX | BPF_XADD | BPF_W
> BPF_STX | BPF_XADD | BPF_DW
>
> It doesn't support BPF_JMP|BPF_CALL with BPF_PSEUDO_CALL at the moment.
>
> IA32 has few general purpose registers, EAX|EDX|ECX|EBX|ESI|EDI. I use
> EAX|EDX|ECX|EBX as temporary registers to simulate instructions in eBPF
> ISA, and allocate ESI|EDI to BPF_REG_AX for constant blinding, all others
> eBPF registers, R0-R10, are simulated through scratch space on stack.
>
[...]
>
> The numbers show we get 30%~50% improvement.
>
> See Documentation/networking/filter.txt for more information.
>
> Signed-off-by: Wang YanQing <udknight@xxxxxxxxx>

Sorry for the delay. There's still a memory leak in this patch I found
while reviewing, more below and how to fix it. Otherwise few small nits
that would be nice to address in the respin along with it.

> ---
> Changes v4-v5:
> 1:Delete is_on_stack, BPF_REG_AX is the only one
> on real hardware registers, so just check with
> it.
> 2:Apply commit 1612a981b766 ("bpf, x64: fix JIT emission
> for dead code"), suggested by Daniel Borkmann.
>
> Changes v3-v4:
> 1:Fix changelog in commit.
> I install llvm-6.0, then test_progs willn't report errors.
> I submit another patch:
> "bpf: fix misaligned access for BPF_PROG_TYPE_PERF_EVENT program type on x86_32 platform"
> to fix another problem, after that patch, test_verifier willn't report errors too.
> 2:Fix clear r0[1] twice unnecessarily in *BPF_IND|BPF_ABS* simulation.
>
> Changes v2-v3:
> 1:Move BPF_REG_AX to real hardware registers for performance reason.
> 3:Using bpf_load_pointer instead of bpf_jit32.S, suggested by Daniel Borkmann.
> 4:Delete partial codes in 1c2a088a6626, suggested by Daniel Borkmann.
> 5:Some bug fixes and comments improvement.
>
> Changes v1-v2:
> 1:Fix bug in emit_ia32_neg64.
> 2:Fix bug in emit_ia32_arsh_r64.
> 3:Delete filename in top level comment, suggested by Thomas Gleixner.
> 4:Delete unnecessary boiler plate text, suggested by Thomas Gleixner.
> 5:Rewrite some words in changelog.
> 6:CodingSytle improvement and a little more comments.
>
> arch/x86/Kconfig | 2 +-
> arch/x86/include/asm/nospec-branch.h | 26 +-
> arch/x86/net/Makefile | 9 +-
> arch/x86/net/bpf_jit_comp32.c | 2527 ++++++++++++++++++++++++++++++++++
> 4 files changed, 2559 insertions(+), 5 deletions(-)
> create mode 100644 arch/x86/net/bpf_jit_comp32.c
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 00fcf81..1f5fa2f 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -137,7 +137,7 @@ config X86
> select HAVE_DMA_CONTIGUOUS
> select HAVE_DYNAMIC_FTRACE
> select HAVE_DYNAMIC_FTRACE_WITH_REGS
> - select HAVE_EBPF_JIT if X86_64
> + select HAVE_EBPF_JIT
> select HAVE_EFFICIENT_UNALIGNED_ACCESS
> select HAVE_EXIT_THREAD
> select HAVE_FENTRY if X86_64 || DYNAMIC_FTRACE
> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index f928ad9..a4c7ca4 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -291,14 +291,17 @@ static inline void indirect_branch_prediction_barrier(void)
> * lfence
> * jmp spec_trap
> * do_rop:
> - * mov %rax,(%rsp)
> + * mov %rax,(%rsp) for x86_64
> + * mov %edx,(%esp) for x86_32
> * retq
> *
> * Without retpolines configured:
> *
> - * jmp *%rax
> + * jmp *%rax for x86_64
> + * jmp *%edx for x86_32
> */
> #ifdef CONFIG_RETPOLINE
> +#ifdef CONFIG_X86_64
> # define RETPOLINE_RAX_BPF_JIT_SIZE 17
> # define RETPOLINE_RAX_BPF_JIT() \
> EMIT1_off32(0xE8, 7); /* callq do_rop */ \
> @@ -310,9 +313,28 @@ static inline void indirect_branch_prediction_barrier(void)
> EMIT4(0x48, 0x89, 0x04, 0x24); /* mov %rax,(%rsp) */ \
> EMIT1(0xC3); /* retq */
> #else
> +# define RETPOLINE_EDX_BPF_JIT() \
> +do { \
> + EMIT1_off32(0xE8, 7); /* call do_rop */ \
> + /* spec_trap: */ \
> + EMIT2(0xF3, 0x90); /* pause */ \
> + EMIT3(0x0F, 0xAE, 0xE8); /* lfence */ \
> + EMIT2(0xEB, 0xF9); /* jmp spec_trap */ \
> + /* do_rop: */ \
> + EMIT3(0x89, 0x14, 0x24); /* mov %edx,(%esp) */ \
> + EMIT1(0xC3); /* ret */ \
> +} while (0)

Nit: by the way, the RETPOLINE_RAX_BPF_JIT() doesn't have a do {} while (0)
construct but for RETPOLINE_EDX_BPF_JIT(), you add it. Could you make both
consistent to each other? I don't really mind which way as long as they're
both consistent.

> +#endif
> +#else /* !CONFIG_RETPOLINE */
> +
> +#ifdef CONFIG_X86_64
> # define RETPOLINE_RAX_BPF_JIT_SIZE 2
> # define RETPOLINE_RAX_BPF_JIT() \
> EMIT2(0xFF, 0xE0); /* jmp *%rax */
> +#else
> +# define RETPOLINE_EDX_BPF_JIT() \
> + EMIT2(0xFF, 0xE2) /* jmp *%edx */
> +#endif
> #endif
>
> #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
[...]
> +struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
> +{
> + struct bpf_binary_header *header = NULL;
> + struct bpf_prog *tmp, *orig_prog = prog;
> + int proglen, oldproglen = 0;
> + struct jit_context ctx = {};
> + bool tmp_blinded = false;
> + u8 *image = NULL;
> + int *addrs;
> + int pass;
> + int i;
> +
> + if (!prog->jit_requested)
> + return orig_prog;
> +
> + tmp = bpf_jit_blind_constants(prog);
> + /* If blinding was requested and we failed during blinding,
> + * we must fall back to the interpreter.
> + */
> + if (IS_ERR(tmp))
> + return orig_prog;
> + if (tmp != prog) {
> + tmp_blinded = true;
> + prog = tmp;
> + }
> +
> + addrs = kmalloc(prog->len * sizeof(*addrs), GFP_KERNEL);
> + if (!addrs) {
> + prog = orig_prog;
> + goto out;
> + }
> +
> + /* Before first pass, make a rough estimation of addrs[]
> + * each bpf instruction is translated to less than 64 bytes
> + */
> + for (proglen = 0, i = 0; i < prog->len; i++) {
> + proglen += 64;
> + addrs[i] = proglen;
> + }
> + ctx.cleanup_addr = proglen;
> +
> + /* JITed image shrinks with every pass and the loop iterates
> + * until the image stops shrinking. Very large bpf programs
> + * may converge on the last pass. In such case do one more
> + * pass to emit the final image
> + */
> + for (pass = 0; pass < 20 || image; pass++) {
> + proglen = do_jit(prog, addrs, image, oldproglen, &ctx);
> + if (proglen <= 0) {
> + image = NULL;
> + if (header)
> + bpf_jit_binary_free(header);
> + prog = orig_prog;
> + goto out_addrs;
> + }
> + if (image) {
> + if (proglen != oldproglen) {
> + pr_err("bpf_jit: proglen=%d != oldproglen=%d\n",
> + proglen, oldproglen);
> + prog = orig_prog;
> + goto out_addrs;

This one above will leak image, you need to also free the header by
calling bpf_jit_binary_free(). You've copied this from x86-64 JIT,
fixed there:

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=3aab8884c9eb99189a3569ac4e6b205371c9ac0b

We've recently merged this cleanup as well for x86-64 JIT, could you
please adapt the x86-32 JIT with similar cleanup as here:

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=a2c7a98301d9f9bcfc4244de04252a04c0b68a79

Would be nice to address in one go so we don't need a follow-up cleanup.
Rest looks good from my side.

Thanks,
Daniel

> + }
> + break;
> + }
> + if (proglen == oldproglen) {
> + header = bpf_jit_binary_alloc(proglen, &image,
> + 1, jit_fill_hole);
> + if (!header) {
> + prog = orig_prog;
> + goto out_addrs;
> + }
> + }
> + oldproglen = proglen;
> + cond_resched();
> + }
> +
> + if (bpf_jit_enable > 1)
> + bpf_jit_dump(prog->len, proglen, pass + 1, image);
> +
> + if (image) {
> + bpf_jit_binary_lock_ro(header);
> + prog->bpf_func = (void *)image;
> + prog->jited = 1;
> + prog->jited_len = proglen;
> + } else {
> + prog = orig_prog;
> + }
> +
> +out_addrs:
> + kfree(addrs);
> +out:
> + if (tmp_blinded)
> + bpf_jit_prog_release_other(prog, prog == orig_prog ?
> + tmp : orig_prog);
> + return prog;
> +}
>