Re: [PATCH perf,bpf 0/5] reveal invisible bpf programs

From: Song Liu
Date: Mon Nov 26 2018 - 15:02:08 EST




> On Nov 26, 2018, at 6:50 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Thu, Nov 22, 2018 at 06:13:32PM +0000, Song Liu wrote:
>> Hi Peter,
>>
>>> On Nov 22, 2018, at 1:32 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>>
>>> On Wed, Nov 21, 2018 at 11:54:57AM -0800, Song Liu wrote:
>>>> Changes RFC -> PATCH v1:
>>>>
>>>> 1. In perf-record, poll vip events in a separate thread;
>>>> 2. Add tag to bpf prog name;
>>>> 3. Small refactorings.
>>>>
>>>> Original cover letter (with minor revisions):
>>>>
>>>> This is to follow up Alexei's early effort to show bpf programs
>
>>>> In this version, PERF_RECORD_BPF_EVENT is introduced to send real time BPF
>>>> load/unload events to user space. In user space, perf-record is modified
>>>> to listen to these events (through a dedicated ring buffer) and generate
>>>> detailed information about the program (struct bpf_prog_info_event). Then,
>>>> perf-report translates these events into proper symbols.
>>>>
>>>> With this set, perf-report will show bpf program as:
>>>>
>>>> 18.49% 0.16% test [kernel.vmlinux] [k] ksys_write
>>>> 18.01% 0.47% test [kernel.vmlinux] [k] vfs_write
>>>> 17.02% 0.40% test bpf_prog [k] bpf_prog_07367f7ba80df72b_
>>>> 16.97% 0.10% test [kernel.vmlinux] [k] __vfs_write
>>>> 16.86% 0.12% test [kernel.vmlinux] [k] comm_write
>>>> 16.67% 0.39% test [kernel.vmlinux] [k] bpf_probe_read
>>>>
>>>> Note that, the program name is still work in progress, it will be cleaner
>>>> with function types in BTF.
>>>>
>>>> Please share your comments on this.
>>>
>>> So I see:
>>>
>>> kernel/bpf/core.c:void bpf_prog_kallsyms_add(struct bpf_prog *fp)
>>>
>>> which should already provide basic symbol information for extant eBPF
>>> programs, right?
>>
>> Right, if the BPF program is still loaded when perf-report runs, symbols
>> are available.
>
> Good, that is not something that was clear. The Changelog seems to imply
> we need this new stuff in order to observe symbols.

I will clarify this in next version.

>
>>> And (AFAIK) perf uses /proc/kcore for annotate on the current running
>>> kernel (if not, it really should, given alternatives, jump_labels and
>>> all other other self-modifying code).
>>>
>>> So this fancy new stuff is only for the case where your profile spans
>>> eBPF load/unload events (which should be relatively rare in the normal
>>> case, right), or when you want source annotated asm output (I normally
>>> don't bother with that).
>>
>> This patch set adds two pieces of information:
>> 1. At the beginning of perf-record, save info of existing BPF programs;
>> 2. Gather information of BPF programs load/unload during perf-record.
>>
>> (1) is all in user space. It is necessary to show symbols of BPF program
>> that are unloaded _after_ perf-record. (2) needs PERF_RECORD_BPF_EVENT
>> from the ring buffer. It covers BPF program loaded during perf-record
>> (perf record -- bpf_test).
>
> I'm saying that if you given them symbols; most people won't need any of
> that ever.
>
> And just tracking kallsyms is _much_ cheaper than either 1 or 2. Alexei
> was talking fairly big amounts of data per BPF prog. Dumping and saving
> that sounds like pointless overhead for 99% of the users.

Due to the kernel-module-like natural of BPF program, I think it is still
necessary to cover cases that BPF programs are unloaded when perf-record
runs. How about we add another step that scans all bpf_prog_XXXX from
kallsyms, and synthesizes symbols for them?

>
>>> That is; I would really like this fancy stuff to be an optional extra
>>> that is typically not needed.
>>>
>>> Does that make sense?
>>
>> (1) above is always enabled with this set. I added option no-bpf-events
>> to disable (2). I guess you prefer the (2) is disabled by default, and
>> enabled with an option?
>
> I'm saying neither should be default enabled. Instead we should do
> recording and tracking by default.
>
> That gets people symbol information on BPF stuff, which is likely all
> they ever need.

How about we extend PERF_RECORD_BPF_EVENT with basic symbol information
(name, addr, length)? By default, we just record these events in the ring
buffer, just like mmap events. This (plus scanning kallsyms before record)
will enable symbols for all BPF programs.

For more information, we add an option to enable more information
(annotated asm, etc.), with dedicated dummy event, thread, and ring buffer.

Thanks,
Song