Re: [PATCH RFC] arm64: eBPF JIT compiler

From: Alexei Starovoitov
Date: Wed Jul 02 2014 - 17:28:51 EST


On Tue, Jul 1, 2014 at 10:20 PM, Zi Shen Lim <zlim.lnx@xxxxxxxxx> wrote:
> The JIT compiler emits A64 instructions. It supports eBPF only.
> Legacy BPF is supported thanks to conversion by BPF core.
>
> JIT is enabled in the same way as for other architectures:
>
> echo 1 > /proc/sys/net/core/bpf_jit_enable
>
> Or for additional compiler output:
>
> echo 2 > /proc/sys/net/core/bpf_jit_enable
>
> See Documentation/networking/filter.txt for more information.
>
> The implementation passes all 57 tests in lib/test_bpf.c
> on ARMv8 Foundation Model :)

Looks great. Comments below:

> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -43,6 +43,7 @@ TEXT_OFFSET := 0x00080000
> export TEXT_OFFSET GZFLAGS
>
> core-y += arch/arm64/kernel/ arch/arm64/mm/
> +core-y += arch/arm64/net/

please use instead:
core-$(CONFIG_NET)

> +
> +#define BITSMASK(bits) ((1 << (bits)) - 1)

there is GENMASK macro already.

> +/* Compare & branch (immediate) */
> +static inline u32 A64_COMP_BRANCH_IMM(int sf, int op, int imm19, int Rt)

odd function name. lower case it?

> +/* Conditional branch (immediate) */
> +static inline u32 A64_COND_BRANCH_IMM(int o1, int imm19, int o0, int cond)

same and in several other places.
I guess you're trying to make the usage look similar for macro and function
calls. I don't think the look-alike is worth it. Functions should be lower case.

> +#define EMIT(insn) emit(insn, ctx)

extra macro just to save one argument? I would add ...,ctx) explicitly.

> +#define EMIT_A64_MOV_I64(reg, val) emit_A64_MOV_I64(reg, val, ctx)

same here and in other cases.

> +static void build_prologue(struct jit_ctx *ctx)
> +{
> + const u8 r6 = bpf2a64[BPF_REG_6];
> + const u8 r7 = bpf2a64[BPF_REG_7];
> + const u8 r8 = bpf2a64[BPF_REG_8];
> + const u8 r9 = bpf2a64[BPF_REG_9];
> + const u8 fp = bpf2a64[BPF_REG_FP];
> + const u8 ra = bpf2a64[BPF_REG_A];
> + const u8 rx = bpf2a64[BPF_REG_X];
> + const u8 tmp1 = bpf2a64[TMP_REG_1];
> + const u8 tmp2 = bpf2a64[TMP_REG_2];
> + int stack_size = MAX_BPF_STACK;
> +
> + stack_size += 16; /* extra for skb_copy_bit buffer */

why extra 16? skb_copy_bits is called with max len 4

> + /* Save callee-saved register */
> + EMIT(A64_PUSH(r6, r7, A64_SP));

simd style double push requires consecutive registers or not? Just curious.

> +/* From load_pointer in net/core/filter.c.
> + * XXX: should we just export it? */
> +extern void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb,
> + int k, unsigned int size);
> +static void *load_pointer_helper(const struct sk_buff *skb, int k,
> + unsigned int size, void *buffer)

That's an interesting way of supporting negative offsets!
probably makes sense to move load_pointer() from net/core/filter.c into filter.h
and export bpf_internal_load_pointer_neg_helper() in filter.h as well,
but I'm not sure which tree you want this stuff to go through.
If it's arm tree, then it's better to keep things as you did and do a
cleanup patch
later. If net-next, then it's better to do it cleanly right away.

> + case BPF_ALU | BPF_MOD | BPF_X:
> + case BPF_ALU64 | BPF_MOD | BPF_X:
> + ctx->tmp_used = 1;
> + EMIT(A64_UDIV(is64, tmp, dst, src));
> + EMIT(A64_MUL(is64, tmp, tmp, src));
> + EMIT(A64_SUB(is64, dst, dst, tmp));
> + break;

there needs to be run-time check for src == 0

> + /* dst = dst OP imm */
> + case BPF_ALU | BPF_ADD | BPF_K:
> + case BPF_ALU64 | BPF_ADD | BPF_K:
> + ctx->tmp_used = 1;
> + EMIT_A64_MOV_I(is64, tmp, imm);
> + EMIT(A64_ADD(is64, dst, dst, tmp));
> + break;

Potential for future optimizations on small immediate?

> + /* function call */
> + case BPF_JMP | BPF_CALL:
> + {
> + const u8 r0 = bpf2a64[BPF_REG_0];
> + const u64 func = (u64)__bpf_call_base + imm;
> +
> + ctx->tmp_used = 1;
> + EMIT_A64_MOV_I64(tmp, func);
> + EMIT(A64_PUSH(A64_FP, A64_LR, A64_SP));
> + EMIT(A64_MOV(1, A64_FP, A64_SP));
> + EMIT(A64_BLR(tmp));

Aren't on arm64 kernel and module_alloc() addresses in the same 32-bit range?
Do you really need 'jump by register' then? Regular 'bl' would be much faster.

> + /* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */
> + case BPF_LD | BPF_ABS | BPF_W:
> + case BPF_LD | BPF_ABS | BPF_H:
> + case BPF_LD | BPF_ABS | BPF_B:
> + case BPF_LD | BPF_ABS | BPF_DW:

there is no such LD_ABS + DW instruction yet.
Would be trivial to add, but let's not rush it in just because it's so easy.

> + /* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + src + imm)) */
> + case BPF_LD | BPF_IND | BPF_W:
> + case BPF_LD | BPF_IND | BPF_H:
> + case BPF_LD | BPF_IND | BPF_B:
> + case BPF_LD | BPF_IND | BPF_DW:
> + {
> + const u8 r0 = bpf2a64[BPF_REG_0]; /* r0 = return value */
> + const u8 r6 = bpf2a64[BPF_REG_6]; /* r6 = pointer to sk_buff */
> + const u8 fp = bpf2a64[BPF_REG_FP];
> + const u8 r1 = bpf2a64[BPF_REG_1]; /* r1: struct sk_buff *skb */
> + const u8 r2 = bpf2a64[BPF_REG_2]; /* r2: int k */
> + const u8 r3 = bpf2a64[BPF_REG_3]; /* r3: unsigned int size */
> + const u8 r4 = bpf2a64[BPF_REG_4]; /* r4: void *buffer */
> + const u8 r5 = bpf2a64[BPF_REG_5]; /* r5: void *(*func)(...) */
> + int size;
> +
> + EMIT(A64_MOV(1, r1, r6));
> + EMIT_A64_MOV_I(0, r2, imm);
> + if (BPF_MODE(code) == BPF_IND)
> + EMIT(A64_ADD(0, r2, r2, src));
> + switch (BPF_SIZE(code)) {
> + case BPF_W:
> + size = 4;
> + break;
> + case BPF_H:
> + size = 2;
> + break;
> + case BPF_B:
> + size = 1;
> + break;
> + case BPF_DW:
> + size = 8;

there is no DW in ld_abs/ld_ind. Let's not rush it in.

> +notyet:
> + pr_info("*** NOT YET: opcode %02x ***\n", code);
> + return -EFAULT;

It's ok to implement JIT support step by step.
Just change pr_info() to pr_info_once() not to spam the logs.

> + default:
> + pr_err("unknown opcode %02x\n", code);

same.

Overall looks great. Thank you for doing all this work!

Alexei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/