Re: [PATCH v5 perf, bpf-next 3/7] perf, bpf: introduce PERF_RECORD_BPF_EVENT

From: Song Liu
Date: Tue Jan 08 2019 - 18:54:59 EST




> On Jan 8, 2019, at 11:43 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Jan 08, 2019 at 07:10:20PM +0000, Song Liu wrote:
>>> On Jan 8, 2019, at 10:41 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>> On Thu, Dec 20, 2018 at 10:29:00AM -0800, Song Liu wrote:
>>>> @@ -986,9 +987,35 @@ enum perf_event_type {
>>>> */
>>>> PERF_RECORD_KSYMBOL = 17,
>>>>
>>>> + /*
>>>> + * Record bpf events:
>>>> + * enum perf_bpf_event_type {
>>>> + * PERF_BPF_EVENT_UNKNOWN = 0,
>>>> + * PERF_BPF_EVENT_PROG_LOAD = 1,
>>>> + * PERF_BPF_EVENT_PROG_UNLOAD = 2,
>>>> + * };
>>>> + *
>>>> + * struct {
>>>> + * struct perf_event_header header;
>>>> + * u16 type;
>>>> + * u16 flags;
>>>> + * u32 id;
>>>> + * u8 tag[BPF_TAG_SIZE];
>>>> + * struct sample_id sample_id;
>>>> + * };
>>>> + */
>>>> + PERF_RECORD_BPF_EVENT = 18,
>>>> +
>>>
>>> Elsewhere today, I raised the point that by the time (however short
>>> interval) userspace gets around to reading this event, the actual
>>> program could be gone again.
>>>
>>> In this case the program has been with us for a very short period
>>> indeed; but it could still have generated some samples or otherwise
>>> generated trace data.
>>
>> Since we already have the separate KSYMBOL events, BPF_EVENT is only
>> required for advanced use cases, like annotation. So I guess missing
>> it for very-short-living programs should not be a huge problem?
>>
>>> It was suggested to allow pinning modules/programs to avoid this
>>> situation, but that of course has other undesirable effects, such as a
>>> trivial DoS.
>>>
>>> A truly horrible hack would be to include an open filedesc in the event
>>> that needs closing to release the resource, but I'm sorry for even
>>> suggesting that **shudder**.
>>>
>>> Do we have any sane ideas?
>>
>> How about we gate the open filedesc solution with an option, and limit
>> that option for root only? If this still sounds hacky, maybe we should
>> just ignore when short-living programs are missed?
>
> I'm afraid we might also 'need' this for the kallsym thing.
>
> The problem is that things like Intel PT (ARM Coresight too IIRC) encode
> a bitstream of branch-taken decisions. The only way to decode that and
> reconstruct the actual code-flow is with an exact matching text image.
>
> In order to have this matching text we need to be able to copy out every
> piece of dynamic text (from kcore) that has ever executed before it
> dissapears.
>
> Elsewhere (*), Andi suggests to have a kind of text-free fence
> interface, where userspace can call a complete. And I suppose as long we
> know there is a consumer, we also know we'll not be blocked
> indefinitely. So it would have to be slightly more complicated than
> suggested, but I think that is something we could work with.
>
> It would also not complicate these events.
>
>
>
> [*] https://lkml.kernel.org/r/20190108172721.GN6118@xxxxxxxxxxxxxxxxxxxx

I think Intel PT case is at instruction granularity (instead of ksymbol
granularity)? If this is true, modules, BPF, and PT could still share
the ksymbol record for basic profiling. And advanced use cases like
annotation will depend on user space to record BPF_EVENT (and equivalent
for other cases) timely. But at least, the ksymbol is already there.

Does this make sense?

Thanks,
Song