Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT

From: Arnaldo Carvalho de Melo
Date: Fri Dec 14 2018 - 08:49:05 EST


Em Thu, Dec 13, 2018 at 09:48:57PM +0000, Song Liu escreveu:
>
>
> > On Dec 13, 2018, at 10:45 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, Dec 12, 2018 at 01:33:20PM -0500, Steven Rostedt wrote:
> >> On Wed, 12 Dec 2018 19:05:53 +0100
> >> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >>
> >>> On Wed, Dec 12, 2018 at 05:09:17PM +0000, Song Liu wrote:
> >>>>> And while this tracks the bpf kallsyms, it does not do all kallsyms.
> >>>>>
> >>>>> .... Oooh, I see the problem, everybody is doing their own custom
> >>>>> kallsym_{add,del}() thing, instead of having that in generic code :-(
> >>>>>
> >>>>> This, for example, doesn't track module load/unload nor ftrace
> >>>>> trampolines, even though both affect kallsyms.
> >>>>
> >>>> I think we can use PERF_RECORD_MMAP(or MMAP2) for module load/unload.
> >>>> That could be separate sets of patches.
> >>>
> >>> So I would actually like to move bpf_lock/bpf_kallsyms/bpf_tree +
> >>> bpf_prog_kallsyms_*() + __bpf_address_lookup() into kernel/kallsyms.c
> >>> and also have ftrace use that.
> >>>
> >>> Because currently the ftrace stuff is otherwise invisible.
> >>>
> >>> A generic kallsym register/unregister for any JIT.
> >>
> >> That's if it needs to look up the symbols that were recorded when init
> >> was unloaded.
> >>
> >> The ftrace kallsyms is used to save the function names of init code
> >> that was freed, but may have been recorded. With out the ftrace
> >> kallsyms the functions traced at init time would just show up as hex
> >> addresses (not very useful).
> >>
> >> I'm not sure how BPF would need those symbols unless they were executed
> >> during init (module or core) and needed to see what the symbols use to
> >> be).
> >
> > Aah, that sounds entirely dodgy and possibly quite broken. We freed that
> > init code, so BPF or your trampolines (or a tiny module) could actually
> > fit in there and insert their own kallsyms, and then we have overlapping
> > symbols, which would be pretty bad.
> >
> > I thought the ftrace kallsym stuff was for the trampolines, which would
> > be fairly similar to what BPF is doing. And why I'm trying to get a
> > generic dynamic kallsym thing sorted. There's bound the be other
> > code-gen things at some point.
>
> Hi Peter,
>
> I guess you are looking for something for all ksym add/delete events, like;
>
> /*
> * PERF_RECORD_KSYMBOL
> *
> * struct {
> * struct perf_event_header header;
> * u64 addr;
> * u32 len;
> * u16 ksym_type;
> * u16 flags;
> * char name[];
> * struct sample_id sample_id;
> * };
> */

Can't this reuse PERF_RECORD_MMAP2 with some bit in the header to mean
that the name is the symbol name, not a path to some ELF/whatever? The
ksym type could be encoded in the prot field, PROT_EXEC for functions,
PROT_READ for read only data, PROT_WRITE for rw data.

If we do it that way older tools will show the DSO name and an
unresolved symbol, and even an indication if its a function or data,
which is better than not showing anything when processing a new
PERF_RECORD_KSYMBOL.

New tools, seeing the perf_event_attr.header bit will know that this is
a "map" with just one symbol and will show that for both DSO name and
symbol.

> We can use ksym_type to encode BPF_EVENT, trampolines, or other type of ksym.
> We can use flags or header.misc to encode ksym add/delete. Is this right?
>
> If we go this direction, shall we reserve a few more bytes in it for different
> types to use, like:
>
> /*
> * PERF_RECORD_KSYMBOL
> *
> * struct {
> * struct perf_event_header header;
> * u64 addr;
> * u32 len;
> * u16 ksym_type;
> * u16 flags;
> * u64 data[2];
> * char name[];
> * struct sample_id sample_id;
> * };
> */
>
> Thanks,
> Song
>

--

- Arnaldo