Re: [PATCH bpf-next 11/15] bpftool: add skeleton codegen command
From: Stanislav Fomichev
Date: Wed Dec 11 2019 - 12:24:36 EST
On 12/10, Andrii Nakryiko wrote:
> On Tue, Dec 10, 2019 at 2:59 PM Stanislav Fomichev <sdf@xxxxxxxxxxx> wrote:
> >
> > On 12/10, Andrii Nakryiko wrote:
> > > On Tue, Dec 10, 2019 at 1:44 PM Stanislav Fomichev <sdf@xxxxxxxxxxx> wrote:
> > > >
> > > > On 12/10, Jakub Kicinski wrote:
> > > > > On Tue, 10 Dec 2019 09:11:31 -0800, Andrii Nakryiko wrote:
> > > > > > On Mon, Dec 9, 2019 at 5:57 PM Jakub Kicinski wrote:
> > > > > > > On Mon, 9 Dec 2019 17:14:34 -0800, Andrii Nakryiko wrote:
> > > > > > > > struct <object-name> {
> > > > > > > > /* used by libbpf's skeleton API */
> > > > > > > > struct bpf_object_skeleton *skeleton;
> > > > > > > > /* bpf_object for libbpf APIs */
> > > > > > > > struct bpf_object *obj;
> > > > > > > > struct {
> > > > > > > > /* for every defined map in BPF object: */
> > > > > > > > struct bpf_map *<map-name>;
> > > > > > > > } maps;
> > > > > > > > struct {
> > > > > > > > /* for every program in BPF object: */
> > > > > > > > struct bpf_program *<program-name>;
> > > > > > > > } progs;
> > > > > > > > struct {
> > > > > > > > /* for every program in BPF object: */
> > > > > > > > struct bpf_link *<program-name>;
> > > > > > > > } links;
> > > > > > > > /* for every present global data section: */
> > > > > > > > struct <object-name>__<one of bss, data, or rodata> {
> > > > > > > > /* memory layout of corresponding data section,
> > > > > > > > * with every defined variable represented as a struct field
> > > > > > > > * with exactly the same type, but without const/volatile
> > > > > > > > * modifiers, e.g.:
> > > > > > > > */
> > > > > > > > int *my_var_1;
> > > > > > > > ...
> > > > > > > > } *<one of bss, data, or rodata>;
> > > > > > > > };
> > > > > > >
> > > > > > > I think I understand how this is useful, but perhaps the problem here
> > > > > > > is that we're using C for everything, and simple programs for which
> > > > > > > loading the ELF is majority of the code would be better of being
> > > > > > > written in a dynamic language like python? Would it perhaps be a
> > > > > > > better idea to work on some high-level language bindings than spend
> > > > > > > time writing code gens and working around limitations of C?
> > > > > >
> > > > > > None of this work prevents Python bindings and other improvements, is
> > > > > > it? Patches, as always, are greatly appreciated ;)
> > > > >
> > > > > This "do it yourself" shit is not really funny :/
> > > > >
> > > > > I'll stop providing feedback on BPF patches if you guy keep saying
> > > > > that :/ Maybe that's what you want.
> > > > >
> > > > > > This skeleton stuff is not just to save code, but in general to
> > > > > > simplify and streamline working with BPF program from userspace side.
> > > > > > Fortunately or not, but there are a lot of real-world applications
> > > > > > written in C and C++ that could benefit from this, so this is still
> > > > > > immensely useful. selftests/bpf themselves benefit a lot from this
> > > > > > work, see few of the last patches in this series.
> > > > >
> > > > > Maybe those applications are written in C and C++ _because_ there
> > > > > are no bindings for high level languages. I just wish BPF programming
> > > > > was less weird and adding some funky codegen is not getting us closer
> > > > > to that goal.
> > > > >
> > > > > In my experience code gen is nothing more than a hack to work around
> > > > > bad APIs, but experiences differ so that's not a solid argument.
> > > > *nod*
> > > >
> > > > We have a nice set of C++ wrappers around libbpf internally, so we can do
> > > > something like BpfMap<key type, value type> and get a much better interface
> > > > with type checking. Maybe we should focus on higher level languages instead?
> > > > We are open to open-sourcing our C++ bits if you want to collaborate.
> > >
> > > Python/C++ bindings and API wrappers are an orthogonal concerns here.
> > > I personally think it would be great to have both Python and C++
> > > specific API that uses libbpf under the cover. The only debatable
> > > thing is the logistics: where the source code lives, how it's kept in
> > > sync with libbpf, how we avoid crippling libbpf itself because
> > > something is hard or inconvenient to adapt w/ Python, etc.
> >
> > [..]
> > > The problem I'm trying to solve here is not really C-specific. I don't
> > > think you can solve it without code generation for C++. How do you
> > > "generate" BPF program-specific layout of .data, .bss, .rodata, etc
> > > data sections in such a way, where it's type safe (to the degree that
> > > language allows that, of course) and is not "stringly-based" API? This
> > > skeleton stuff provides a natural, convenient and type-safe way to
> > > work with global data from userspace pretty much at the same level of
> > > performance and convenience, as from BPF side. How can you achieve
> > > that w/ C++ without code generation? As for Python, sure you can do
> > > dynamic lookups based on just the name of property/method, but amount
> > > of overheads is not acceptable for all applications (and Python itself
> > > is not acceptable for those applications). In addition to that, C is
> > > the best way for other less popular languages (e.g., Rust) to leverage
> > > libbpf without investing lots of effort in re-implementing libbpf in
> > > Rust.
> > I'd say that a libbpf API similar to dlopen/dlsym is a more
> > straightforward thing to do. Have a way to "open" a section and
> > a way to find a symbol in it. Yes, it's a string-based API,
> > but there is nothing wrong with it. IMO, this is easier to
> > use/understand and I suppose Python/C++ wrappers are trivial.
>
> Without digging through libbpf source code (or actually, look at code,
> but don't run any test program), what's the name of the map
> corresponding to .bss section, if object file is
> some_bpf_object_file.o? If you got it right (congrats, btw, it took me
> multiple attempts to memorize the pattern), how much time did you
> spend looking it up? Now compare it to `skel->maps.bss`. Further, if
> you use anonymous structs for your global vars, good luck maintaining
> two copies of that: one for BPF side and one for userspace.
As your average author of BPF programs I don't really care
which section my symbol ends up into. Just give me an api
to mmap all "global" sections (or a call per section which does all the
naming magic inside) and lookup symbol by name; I can cast it to a proper
type and set it.
RE anonymous structs: maybe don't use them if you want to share the data
between bpf and userspace?
> I never said there is anything wrong with current straightforward
> libbpf API, but I also never said it's the easiest and most
> user-friendly way to work with BPF either. So we'll have both
> code-generated interface and existing API. Furthermore, they are
> interoperable (you can pass skel->maps.whatever to any of the existing
> libbpf APIs, same for progs, links, obj itself). But there isn't much
> that can beat performance and usability of code-generated .data, .bss,
> .rodata (and now .extern) layout.
I haven't looked closely enough, but is there a libbpf api to get
an offset of a variable? Suppose I have the following in bpf.c:
int a;
int b;
Can I get an offset of 'b' in the .bss without manually parsing BTF?
TBH, I don't buy the performance argument for these global maps.
When you did the mmap patchset for the array, you said it yourself
that it's about convenience and not performance.
> > As for type-safety: it's C, forget about it :-)
>
> C is weakly, but still typed language. There are types and they are
> helpful. Yes, you can disregard them and re-interpret values as
> anything, but that's beside the point.
My point was that there is a certain mental model when working
with this type of external symbols which feels "natural" for C
(dlopen/dlsym).
But I agree with you, that as long as code-gen is optional
and there is an alternative api in libbpf, we should be good.