Re: [PATCH v2 1/3] perf/core: Add a tracepoint for perf sampling

From: Peter Zijlstra
Date: Mon Aug 08 2016 - 06:04:00 EST


On Fri, Aug 05, 2016 at 12:45:07PM -0700, Alexei Starovoitov wrote:

> Also thinking about concurrency and the need to remember the original
> handler somewhere, would it be cleaner api to add a bit to perf_event_attr
> and use attr.config1 as bpf_fd ?

attr.config[12] are in use already, typically uncore events need them.
We cannot rely on those bits being unused.

> Then perf_event_open at event creation time will use bpf prog as
> overflow_handler. That solves concurrency concerns and potential semantical
> issues if we go with ioctl() approach.

> Like if we perf_event_open() an event for a task, then bpf attach to it,
> what children task and corresponding inherited events suppose to do?
> Inherit overflow_handler, right? but then deatch of bpf in the parent
> suppose to clear it in inherited events as well. A bit complicated.
> I guess we can define it that way.
> Just seems easier to do bpf attach at perf_event_open time only.

Which is why I would've liked BPF to create its own events, instead of
userspace passing them in through this array.

Because now you have a chicken'n'egg issue with that you want BPF to use
the overflow handler but need BPF running before you have an actual
handler to link to.

> > Urgh, does it have to be stable API? Can't we simply rely on the kernel
> > headers to provide the right structure definition?
>
> yes we can. The concern is about assumptions people will make about
> perf_sample_data and the speed of access to it. From bpf program point
> of view the pointer to perf_sample_data will be opaque unsafe pointer,
> so any access to fields would have to be done via bpf_probe_read which
> has non-trivial overhead.
> If we go with the uapi mirror of perf_sample_data approach, it will be
> fast, since mirror is not an actual struct. Like the 'struct __sk_buff' we
> have in uapi/linux/bpf.h is a meta structure. It's not allocated anywhere
> and no fields are copied. When bpf program does 'skb->vlan_present'
> the verifier rewrites it at load time into corresponding access to
> kernel internal 'struct sk_buff' fields with bitmask, shifts and such.
> For this case we can define something like
> struct bpf_perf_sample_data {
> __u64 period;
> };
> then bpf prog will only be able to access that signle field which verifier
> will translate into 'data->period' where data is 'struct perf_sample_data *'
> Later we can add other fields if necessary. The kernel is free to mess
> around with perf_sample_data whichever way without impacting bpf progs.

Hmm, I was not aware of that. Should be doable indeed.