Re: [PATCH perf,bpf 0/5] reveal invisible bpf programs

From: Song Liu
Date: Tue Nov 27 2018 - 14:26:31 EST




> On Nov 26, 2018, at 6:50 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Thu, Nov 22, 2018 at 06:13:32PM +0000, Song Liu wrote:
>> Hi Peter,
>>
>>> On Nov 22, 2018, at 1:32 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>>
>>> On Wed, Nov 21, 2018 at 11:54:57AM -0800, Song Liu wrote:
>>>> Changes RFC -> PATCH v1:
>>>>
>>>> 1. In perf-record, poll vip events in a separate thread;
>>>> 2. Add tag to bpf prog name;
>>>> 3. Small refactorings.
>>>>
>>>> Original cover letter (with minor revisions):
>>>>
>>>> This is to follow up Alexei's early effort to show bpf programs
>
>>>> In this version, PERF_RECORD_BPF_EVENT is introduced to send real time BPF
>>>> load/unload events to user space. In user space, perf-record is modified
>>>> to listen to these events (through a dedicated ring buffer) and generate
>>>> detailed information about the program (struct bpf_prog_info_event). Then,
>>>> perf-report translates these events into proper symbols.
>>>>
>>>> With this set, perf-report will show bpf program as:
>>>>
>>>> 18.49% 0.16% test [kernel.vmlinux] [k] ksys_write
>>>> 18.01% 0.47% test [kernel.vmlinux] [k] vfs_write
>>>> 17.02% 0.40% test bpf_prog [k] bpf_prog_07367f7ba80df72b_
>>>> 16.97% 0.10% test [kernel.vmlinux] [k] __vfs_write
>>>> 16.86% 0.12% test [kernel.vmlinux] [k] comm_write
>>>> 16.67% 0.39% test [kernel.vmlinux] [k] bpf_probe_read
>>>>
>>>> Note that, the program name is still work in progress, it will be cleaner
>>>> with function types in BTF.
>>>>
>>>> Please share your comments on this.
>>>
>>> So I see:
>>>
>>> kernel/bpf/core.c:void bpf_prog_kallsyms_add(struct bpf_prog *fp)
>>>
>>> which should already provide basic symbol information for extant eBPF
>>> programs, right?
>>
>> Right, if the BPF program is still loaded when perf-report runs, symbols
>> are available.
>
> Good, that is not something that was clear. The Changelog seems to imply
> we need this new stuff in order to observe symbols.
>
>>> And (AFAIK) perf uses /proc/kcore for annotate on the current running
>>> kernel (if not, it really should, given alternatives, jump_labels and
>>> all other other self-modifying code).
>>>
>>> So this fancy new stuff is only for the case where your profile spans
>>> eBPF load/unload events (which should be relatively rare in the normal
>>> case, right), or when you want source annotated asm output (I normally
>>> don't bother with that).
>>
>> This patch set adds two pieces of information:
>> 1. At the beginning of perf-record, save info of existing BPF programs;
>> 2. Gather information of BPF programs load/unload during perf-record.
>>
>> (1) is all in user space. It is necessary to show symbols of BPF program
>> that are unloaded _after_ perf-record. (2) needs PERF_RECORD_BPF_EVENT
>> from the ring buffer. It covers BPF program loaded during perf-record
>> (perf record -- bpf_test).
>
> I'm saying that if you given them symbols; most people won't need any of
> that ever.
>
> And just tracking kallsyms is _much_ cheaper than either 1 or 2. Alexei
> was talking fairly big amounts of data per BPF prog. Dumping and saving
> that sounds like pointless overhead for 99% of the users.

If annotation is not needed, we have the option to reduce the amount of
data per bpf prog by not requesting JITed binaries. Would a flag to tune
this solve the concern of dumping and saving too much data?

Thanks,
Song