Re: [PATCH bpf-next] bpf: Add bpf_read_raw_record() helper

From: Song Liu
Date: Fri Aug 26 2022 - 16:52:21 EST




> On Aug 26, 2022, at 12:21 PM, Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>
> On Fri, Aug 26, 2022 at 11:09 AM Song Liu <songliubraving@xxxxxx> wrote:
>>
>>
>>
>>> On Aug 26, 2022, at 9:33 AM, Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>>>
>>> On Thu, Aug 25, 2022 at 10:53 PM Song Liu <song@xxxxxxxxxx> wrote:
>>>>
>>>> On Thu, Aug 25, 2022 at 10:22 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>>>>>
>>>>> On Thu, Aug 25, 2022 at 7:35 PM Song Liu <songliubraving@xxxxxx> wrote:
>>>>>> Actually, since we are on this, can we make it more generic, and handle
>>>>>> all possible PERF_SAMPLE_* (in enum perf_event_sample_format)? Something
>>>>>> like:
>>>>>>
>>>>>> long bpf_perf_event_read_sample(void *ctx, void *buf, u64 size, u64 flags);
>>>>>>
>>>>>> WDYT Namhyung?
>>>>>
>>>>> Do you mean reading the whole sample data at once?
>>>>> Then it needs to parse the sample data format properly
>>>>> which is non trivial due to a number of variable length
>>>>> fields like callchains and branch stack, etc.
>>>>>
>>>>> Also I'm afraid I might need event configuration info
>>>>> other than sample data like attr.type, attr.config,
>>>>> attr.sample_type and so on.
>>>>>
>>>>> Hmm.. maybe we can add it to the ctx directly like ctx.attr_type?
>>>>
>>>> The user should have access to the perf_event_attr used to
>>>> create the event. This is also available in ctx->event->attr.
>>>
>>> Do you mean from BPF? I'd like to have a generic BPF program
>>> that can handle various filtering according to the command line
>>> arguments. I'm not sure but it might do something differently
>>> for each event based on the attr settings.
>>
>> Yeah, we can access perf_event_attr from BPF program. Note that
>> the ctx for perf_event bpf program is struct bpf_perf_event_data_kern:
>>
>> SEC("perf_event")
>> int perf_e(struct bpf_perf_event_data_kern *ctx)
>> {
>> ...
>> }
>>
>> struct bpf_perf_event_data_kern {
>> bpf_user_pt_regs_t *regs;
>> struct perf_sample_data *data;
>> struct perf_event *event;
>> };
>
> I didn't know that it's allowed to access the kernel data directly.
> For some reason, I thought it should use fields in bpf_event_event_data
> only, like sample_period and addr. And the verifier will convert the
> access to them according to pe_prog_convert_ctx_access().

We can bypass pe_prog_convert_ctx_access() with something like:

struct perf_event *event;
u64 config;

bpf_probe_read_kernel(&event, sizeof(void *), &ctx->event);
bpf_probe_read_kernel(&config, sizeof(u64), &event->attr.config);

Thanks,
Song