Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT

From: Song Liu
Date: Thu Nov 08 2018 - 13:05:34 EST


Hi Peter,

> On Nov 8, 2018, at 7:00 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Wed, Nov 07, 2018 at 06:25:04PM +0000, Song Liu wrote:
>>
>>
>>> On Nov 7, 2018, at 12:40 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>>
>>> On Tue, Nov 06, 2018 at 12:52:42PM -0800, Song Liu wrote:
>>>> For better performance analysis of BPF programs, this patch introduces
>>>> PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
>>>> load/unload information to user space.
>>>>
>>>> /*
>>>> * Record different types of bpf events:
>>>> * enum perf_bpf_event_type {
>>>> * PERF_BPF_EVENT_UNKNOWN = 0,
>>>> * PERF_BPF_EVENT_PROG_LOAD = 1,
>>>> * PERF_BPF_EVENT_PROG_UNLOAD = 2,
>>>> * };
>>>> *
>>>> * struct {
>>>> * struct perf_event_header header;
>>>> * u16 type;
>>>> * u16 flags;
>>>> * u32 id; // prog_id or map_id
>>>> * };
>>>> */
>>>> PERF_RECORD_BPF_EVENT = 17,
>>>>
>>>> PERF_RECORD_BPF_EVENT contains minimal information about the BPF program.
>>>> Perf utility (or other user space tools) should listen to this event and
>>>> fetch more details about the event via BPF syscalls
>>>> (BPF_PROG_GET_FD_BY_ID, BPF_OBJ_GET_INFO_BY_FD, etc.).
>>>
>>> Why !? You're failing to explain why it cannot provide the full
>>> information there.
>>
>> Aha, I missed this part. I will add the following to next version. Please
>> let me know if anything is not clear.
>
>>
>> This design decision is picked for the following reasons. First, BPF
>> programs could be loaded-and-jited and/or unloaded before/during/after
>> perf-record run. Once a BPF programs is unloaded, it is impossible to
>> recover details of the program. It is impossible to provide the
>> information through a simple key (like the build ID). Second, BPF prog
>> annotation is under fast developments. Multiple informations will be
>> added to bpf_prog_info in the next few releases. Including all the
>> information of a BPF program in the perf ring buffer requires frequent
>> changes to the perf ABI, and thus makes it very difficult to manage
>> compatibility of perf utility.
>
> So I don't agree with that reasoning. If you want symbol information
> you'll just have to commit to some form of ABI. That bpf_prog_info is an
> ABI too.

At the beginning of the perf-record run, perf need to query bpf_prog_info
of already loaded BPF programs. Therefore, we need to commit to the
bpf_prog_info ABI. If we also include full information of the BPF program
in the perf ring buffer, we will commit to TWO ABIs.

Also, perf-record write the event to perf.data file, so the data need to be
serialized. This is implemented in patch 4/5. To include the data in the
ring buffer, we will need another piece of code in the kernel to do the
same serialization work.

On the other hand, processing BPF load/unload events synchronously should
not introduce too much overhead for meaningful use cases. If many BPF progs
are being loaded/unloaded within short period of time, it is not the steady
state that profiling works care about.

Would these resolve your concerns?

Thanks,
Song