Re: [RFC PATCH v2 tip 0/7] 64-bit BPF insn set and tracing filters
From: Alexei Starovoitov
Date: Thu Feb 13 2014 - 23:47:29 EST
On Thu, Feb 13, 2014 at 12:20 PM, Daniel Borkmann <dborkman@xxxxxxxxxx> wrote:
> On 02/07/2014 02:20 AM, Alexei Starovoitov wrote:
> ...
>>
>> Hi Daniel,
>
>
> Thanks for your answer and sorry for the late reply.
>
>
>> Thank you for taking a look. Good questions. I had the same concerns.
>> Old BPF was carefully extended in specific places.
>> End result may look big at first glance, but every extension has specific
>> reason behind it. I tried to explain the reasoning in
>> Documentation/bpf_jit.txt
>>
>> I'm planning to write an on-the-fly converter from old BPF to BPF64
>> when BPF64 manages to demonstrate that it is equally safe.
>> It is straight forward to convert. Encoding is very similar.
>> Core concepts are the same.
>> Try diff include/uapi/linux/filter.h include/linux/bpf.h
>> to see how much is reused.
>>
>> I believe that old BPF outlived itself and BPF64 should
>> replace it in all current use cases plus a lot more.
>> It just cannot happen at once.
>> BPF64 can come in. bpf32->bpf64 converter functioning.
>> JIT from bpf64->aarch64 and may be sparc64 needs to be in place.
>> Then old bpf can fade away.
>
>
> Do you see a possibility to integrate your work step by step? That is,
Sure. let's see how we can do it.
> to first integrate the interpreter part only; meaning, to detect "old"
> BPF programs e.g. coming from SO_ATTACH_FILTER et al and run them in
> compatibility mode while extended BPF is fully integrated and replaces
> the old engine in net/core/filter.c. Maybe, "old" programs can be
do you mean drop bfp64_jit, checker and just have bpf32->bpf64 converter
and bpf64 interpreter as phase 1 ?
Checking is done by old bpf32,
all existing bpf32 jits, if available, can convert bpf32 to native,
but interpreter will be running on bpf64 ?
phase 2 to introduce bpf64_x86 jit and so on?
Sounds fine.
Today I didn't try to optimize bpf64 interpreter, since insn set is designed
for eventual JITing and interpreter is there to support archs that don't
have jit yet.
I guess I have to tweak it to perform at bpf32 interpreter speeds.
> transformed transparently to the new representation and then would be
> good to execute in eBPF. If possible, in such a way that in the first
> step JIT compilers won't need any upgrades. Once that is resolved,
> JIT compilers could successively migrate, arch by arch, to compile the
> new code? And last but not least the existing tools as well for handling
> eBPF. I think, if possible, that would be great. Also, I unfortunately
> haven't looked into your code too deeply yet due to time constraints,
> but I'm wondering e.g. for accessing some skb fields we currently use
> the "hack" to "overload" load instructions with negative arguments. Do
> we have a sort of "meta" instruction that is extendible in eBPF to avoid
> such things in future?
Exactly.
This 'negative offset' hack of bpf32 isn't very clean, since jits for all archs
need to change when new offsets added.
For bpf64 I'm proposing a customizable 'bpf_context' and variable set
of bpf-callable functions, so JITs don't need to change and verifier
stays the same.
That's the idea behind 'bpf_callbacks' in include/linux/bpf_jit.h
Some meta data makes sense to pass as input into bpf program.
Like for seccomp 'bpf_context' can be 'struct seccomp_data'
For networking, bpf_context can be 'skb',
then bpf_s_anc_protocol becomes a normal 2-byte bpf64 load
from skb->protocol field. Allowing access to other fields of skb
is just a matter of defining permissions of 'struct bpf_context' in
bpf_callback->get_context_access()
Some other meta data and extensions are cleaner when defined
as function calls from bpf, since calls are free.
I think bpf_table_lookup() is a fundamental one that allows to define
arbitrary tables within bpf and access them from the program.
(here I need feedback the most whether to access tables
via netlink from userspace or via debugfs...)
It probably will be easier to read the code of bpf32-bpf64 converter
to understand the differences between the two.
I guess I have to start working on the converter sooner than I thought...
Thanks
Alexei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/