Re: [PATCH RFC v4 net-next 01/26] net: filter: add "load 64-bit immediate" eBPF instruction

From: Alexei Starovoitov
Date: Wed Aug 13 2014 - 13:44:13 EST


On Wed, Aug 13, 2014 at 9:08 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Wed, Aug 13, 2014 at 12:57 AM, Alexei Starovoitov <ast@xxxxxxxxxxxx> wrote:
>> add BPF_LD_IMM64 instruction to load 64-bit immediate value into register.
>> All previous instructions were 8-byte. This is first 16-byte instruction.
>> Two consecutive 'struct bpf_insn' blocks are interpreted as single instruction:
>> insn[0/1].code = BPF_LD | BPF_DW | BPF_IMM
>> insn[0/1].dst_reg = destination register
>> insn[0].imm = lower 32-bit
>> insn[1].imm = upper 32-bit
>
> This might be unnecessarily difficult for fancy static analysis tools
> to reason about. Would it make sense to assign two different codes
> for this? For example, insn[0].code = code_for_load_low,
> insns[1].code = code_for_load_high, along with a verifier check that
> they come in matched pairs and that code_for_load_high isn't a jump
> target?

see my reply to David for the same thing. Short answer is that
sequence of instructions (even if it is a pair of instructions like this)
is very hard to detect in verifier and JITs.
As soon as we give compiler two instructions instead of one,
compiler may optimize them in a fancy ways. Like two loads of
64-bit immediate with upper 32-bit the same, may came out as
4 instructions: load_high, load_low, load_low, mov.
Or in some cases as single load_low, etc.
load 64-bit imm has to stay as single instruction to be verifiable
and patch-able easily.
One can argue: force compiler to emit load_low and load_hi
always together, but then that's exactly what I have. It's a single insn.

> (Something else that I find confusing about eBPF: the instruction
> mnemonics are very strange. Have you considered giving them real
> names? For example, load.imm.low instead of BPF_LD | BPF_DW | BPF_IMM
> is easier to read and pronounce.)

BPF_LD | BPF_DW | BPF_IMM is not really a name. It's macro
for cases when instructions are generated from inside the kernel.
Instructions mnemonics are not defined yet.
llvm emits assembler code like:
bpf_prog2:
ldw r1, 16(r1)
std -8(r10), r1
mov r1, 1
std -16(r10), r1
ld_64 r1, 1
mov r2, r10
addi r2, -8
call 4
jeqi r0, 0 goto .LBB1_2
ldd r1, 0(r0)
addi r1, 1
std 0(r0), r1
.LBB1_3:
mov r0, 0
ret
...
I'm open to change assembler/disassembler mnemonics.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/