Re: [PATCH v3 net-next 1/1] bpf32->bpf64 mapper and bpf64 interpreter

From: Daniel Borkmann
Date: Fri Feb 28 2014 - 22:23:33 EST


On 02/28/2014 09:53 PM, Alexei Starovoitov wrote:
On Fri, Feb 28, 2014 at 4:45 AM, Daniel Borkmann <dborkman@xxxxxxxxxx> wrote:
...
Did you also test that seccomp-BPF still works out?

yes. Have a prototype, but it needs a bit more cleanup.

Here's [1] actually some code snippet for user space for prctl(). The
libseccomp [2] actually wraps around that and makes usage easier.

[1] http://outflux.net/teach-seccomp/
[2] http://lwn.net/Articles/491308/

...
We should keep naming consistent (so either extended BPF or BPF64),
so maybe bpf_ext_enable ? I'd presume rather {bpf,sk_filter}*_ext

agree. we need consistent naming for both (old and new).
I'll try all-out rename of bpf_*() into sk_filter64_*() and sk_filter_ext_*()
to see which one looks better.
I'm kinda leaning to sk_filter64, since it's easier to quickly spot
the difference
and more descriptive.

Just saw your second email regarding sk_filter_ext() et al, yep, I agree.

as in 'struct bpf_insn' the immediate value is 32 bit, so for 64 bit
comparisons, you'd still need to load to immediate values, right?

there is no insn that use 64-bit immediate, since 64-bit immediates
are extremely rare. grep x86-64 asm code for movabsq will return very few.
llvm or gcc can easily construct any constant by combination of mov,
shifts and ors.
bpf64 comparisons are all 64-bit right now. So far I didn't see a need to do
32-bit comparison, since old bpf is all unsigned, mapping 32->64 of
Jxx is painless.

Hm, fair enough, I was just thinking for comparisons of IPv6 addresses
when we do socket filtering. On the other hand, old and new insns are
both 64 bit wide and can be used though the same api then.

Notice I added signed comparison, since real life programs cannot do
without them.
I also kept the spirit of old bpf having > and >= only. (no < and <=)
that made llvm/gcc backends a bit harder to do, since no real cpu has
such limitations.

Hehe, I proposed them once, but for low level BPF it was just easier to
adjust jt/jf offsets differently to achieve the same.

I'm still contemplating do add < and <= (both signed and unsigned), which is
interesting trade-off: number of instructions vs complexity of compiler

After all your changes, we will still have the bpf_jit_enable knob
in tact, right?

Yes.
In this diff the workflow is the following:

old filter comes through sk_attach_filter() or sk_unattached_filter_create()
if (bpf64) {
convert to new
sk_chk_filter() - check old bpf
use new interpreter
} else {
sk_chk_filter() - check old bpf
if (bpf_jit_enable)
use old jit
else
use old interpreter
}
soon I'll add bpf64 jit and will reuse the same bpf_jit_enable knob for it.
then add new/old inband demux into sk_attach_filter(),
so that workflow will become:

a filter comes through sk_attach_filter() or sk_unattached_filter_create()
if (new filter prog) {
sk_chk_filter64() - check new bpf
if (bpf_jit_enable)
use new jit
else
use new interpreter
} else {
if (bpf64_enable) {
convert to new
sk_chk_filter() - check old bpf
if (bpf_jit_enable)
use new jit
else
use new interpreter
} else {
sk_chk_filter()
if (bpf_jit_enable)
use old jit
else
use old interpreter
}
}
eventually bpf64_enable can be made default,
the last 'else' can be retired and 'bpf64_enable' removed along with
old interpreter.

bpf_jit_enable knob will stay for foreseeable future.

Okay, cool. I think it seems reasonable to keep this knob around anyway.
E.g. for seccomp some people might argue speed is important, maybe other
more security related distros might not want to rely on JIT and therefore
trade performance.

...
Why would that need to be exported as a symbol?

the performance numbers I mentioned are from bpf_bench that is done
as kernel module, so I used this for debugging from it.
also to see what execution traces I get while running trinity bpf fuzzer :)

I would actually like to avoid having this pr_info* inside the kernel.
Couldn't this be done e.g. through systemtap script that could e.g. be
placed under tools/net/ or inside the documentation file?

like the idea!
Will drop it from the diff and eventually will move it to tools/net.

Sounds great!

Thanks,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/