Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
From: Song Liu
Date: Wed Dec 12 2018 - 12:35:06 EST
> On Dec 12, 2018, at 5:27 AM, Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:
>
> Em Wed, Dec 12, 2018 at 02:15:49PM +0100, Peter Zijlstra escreveu:
>> On Tue, Dec 11, 2018 at 03:33:47PM -0800, Song Liu wrote:
>>> For better performance analysis of BPF programs, this patch introduces
>>> PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
>>> load/unload information to user space.
>>>
>>> Each BPF program may contain up to BPF_MAX_SUBPROGS (256) sub programs.
>>> The following example shows kernel symbols for a BPF program with 7
>>> sub programs:
>>>
>>> ffffffffa0257cf9 t bpf_prog_b07ccb89267cf242_F
>>> ffffffffa02592e1 t bpf_prog_2dcecc18072623fc_F
>>> ffffffffa025b0e9 t bpf_prog_bb7a405ebaec5d5c_F
>>> ffffffffa025dd2c t bpf_prog_a7540d4a39ec1fc7_F
>>> ffffffffa025fcca t bpf_prog_05762d4ade0e3737_F
>>> ffffffffa026108f t bpf_prog_db4bd11e35df90d4_F
>>> ffffffffa0263f00 t bpf_prog_89d64e4abf0f0126_F
>>> ffffffffa0257cf9 t bpf_prog_ae31629322c4b018__dummy_tracepoi
>>
>> Doesn't BPF have enough information to generate 'saner' names? Going by
>> the thing below, these sub-progs are actually functions, right?
>
> Yeah, this looks just like a symbol table, albeit just with functions,
> so far.
>>> /*
>>> * Record different types of bpf events:
>>> * enum perf_bpf_event_type {
>>> * PERF_BPF_EVENT_UNKNOWN = 0,
>>> * PERF_BPF_EVENT_PROG_LOAD = 1,
>>> * PERF_BPF_EVENT_PROG_UNLOAD = 2,
>>> * };
>>> *
>>> * struct {
>>> * struct perf_event_header header;
>>> * u32 type;
>>> * u32 flags;
>>> * u32 id; // prog_id or other id
>>> * u32 sub_id; // subprog id
>>> *
>>> * // for bpf_prog types, bpf prog or subprog
>>> * u8 tag[BPF_TAG_SIZE];
>>> * u64 addr;
>>> * u64 len;
>>> * char name[];
>>> * struct sample_id sample_id;
>>> * };
>>> */
>>
>> Isn't this mixing two different things (poorly)? The kallsym update and
>> the BPF load/unload ?
>>
>> And while this tracks the bpf kallsyms, it does not do all kallsyms.
>>
>> .... Oooh, I see the problem, everybody is doing their own custom
>> kallsym_{add,del}() thing, instead of having that in generic code :-(
>
>> This, for example, doesn't track module load/unload nor ftrace
>> trampolines, even though both affect kallsyms.
>
> So you think the best would have to be a PERF_RECORD_ that just states
> that code has been loaded at range (addr, len). I.e. much like
> PERF_RECORD_MMAP does, just for userspace? Then it could be used by BPF
> and any other kernel facility like the ones you described?
>
> There would be an area that would be used by each of these facilities to
> figure out further info, like we use the mmap->name to go and get the
> symbol table from ELF files, etc, but BPF could use for their specific
> stuff?
>
> The above is almost like PERF_RECORD_MMAP (PERF_BPF_EVENT_PROG_LOAD) +
> PERF_RECORD_MUNMAP(PERF_BPF_EVENT_PROG_UNLOAD) in one event, with this
> 'type' thing allowing for more stuff to be put in later, I guess.
>
> - Arnaldo
PERF_RECORD_MMAP and PERF_RECORD_BPF_EVENT are issued at different
granularities:
One PERF_RECORD_MMAP is issued for each mmap, which may contain
many symbols;
PERF_RECORD_BPF_EVENT is issued for each symbol (sub program).
Unlike mmap to ELF file, where all symbols have static offsets within
the file, different sub programs of a BPF program have their own
page(s) and random starting addresses within the page(s). Therefore, we
cannot have one PERF_RECORD_BPF_EVENT to cover multiple sub programs.
Current version has these mmap-like PERF_RECORD_BPF_EVENT in perf.data,
which is sufficient for basic profiling cases. For annotation and source
line matching, use space need to process PERF_RECORD_BPF_EVENT (from
kernel and synthesized).
Thanks,
Song
>>> +void perf_event_bpf_event_prog(enum perf_bpf_event_type type,
>>> + struct bpf_prog *prog)
>>> +{
>>> + if (!atomic_read(&nr_bpf_events))
>>> + return;
>>> +
>>> + if (type != PERF_BPF_EVENT_PROG_LOAD &&
>>> + type != PERF_BPF_EVENT_PROG_UNLOAD)
>>> + return;
>>> +
>>> + if (prog->aux->func_cnt == 0) {
>>> + perf_event_bpf_event_subprog(type, prog,
>>> + prog->aux->id, 0);
>>> + } else {
>>> + int i;
>>> +
>>> + for (i = 0; i < prog->aux->func_cnt; i++)
>>> + perf_event_bpf_event_subprog(type, prog->aux->func[i],
>>> + prog->aux->id, i);
>>> + }
>>> +}
>>
>
> --
>
> - Arnaldo