Re: [PATCH v3 linux-trace 4/8] samples: bpf: simple tracing example in C

From: Steven Rostedt
Date: Tue Feb 10 2015 - 07:23:47 EST


Added Linus because he's the one that would revert changes on breakage.

On Mon, 9 Feb 2015 21:45:21 -0800
Alexei Starovoitov <ast@xxxxxxxxxxxx> wrote:

> On Mon, Feb 9, 2015 at 9:16 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> > On Mon, 9 Feb 2015 23:08:36 -0500
> > Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >
> >> I don't want to get stuck with pinned kernel data structures again. We
> >> had 4 blank bytes of data for every event, because latency top hard
> >> coded the field. Luckily, the 64 bit / 32 bit interface caused latency
> >> top to have to use the event_parse code to work, and we were able to
> >> remove that field after it was converted.
>
> I think your main point boils down to:
>
> > But I still do not want any hard coded event structures. All access to
> > data from the binary code must be parsed by looking at the event/format
> > files. Otherwise you will lock internals of the kernel as userspace
> > ABI, because eBPF programs will break if those internals change, and
> > that could severely limit progress in the future.
>
> and I completely agree.
>
> the patch 4 is an example. It doesn't mean in any way
> that structs defined here is an ABI.
> To be compatible across kernels the user space must read
> format file as you mentioned in your other reply.

The thing is, this is a sample. Which means it will be cut and pasted
into other programs. If the sample does not follow the way we want
users to use this, then how can we complain if they hard code it as
well?

>
> > I'm wondering if we should label eBPF programs as "modules". That is,
> > they have no guarantee of working from one kernel to the next. They
> > execute in the kernel, thus they are very similar to modules.
> >
> > If we can get Linus to say that eBPF programs are not user space, and
> > that they are treated the same as modules (no internal ABI), then I
> > think we can be a bit more free at what we allow.
>
> I thought we already stated that.
> Here is the quote from perf_event.h:
> * # The RAW record below is opaque data wrt the ABI
> * #
> * # That is, the ABI doesn't make any promises wrt to
> * # the stability of its content, it may vary depending
> * # on event, hardware, kernel version and phase of
> * # the moon.
> * #
> * # In other words, PERF_SAMPLE_RAW contents are not an ABI.
>
> and this example is reading PERF_SAMPLE_RAW events and
> uses locally defined structs to print them for simplicity.

As we found out the hard way with latencytop, comments like this does
not matter. If an application does something like this, it's our fault
if it breaks later. We can't say "hey you were suppose to do it this
way". That argument breaks down even more if our own examples do not
follow the way we want others to do things.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/