Re: [PATCH bpf] bpf: Add LBR data to BPF_PROG_TYPE_PERF_EVENT prog context

From: Daniel Xu
Date: Mon Dec 16 2019 - 14:36:03 EST


On Fri Dec 6, 2019 at 9:10 AM, Andrii Nakryiko wrote:
> On Thu, Dec 5, 2019 at 4:13 PM Daniel Xu <dxu@xxxxxxxxx> wrote:
> >
> > Last-branch-record is an intel CPU feature that can be configured to
> > record certain branches that are taken during code execution. This data
> > is particularly interesting for profile guided optimizations. perf has
> > had LBR support for a while but the data collection can be a bit coarse
> > grained.
> >
> > We (Facebook) have recently run a lot of experiments with feeding
> > filtered LBR data to various PGO pipelines. We've seen really good
> > results (+2.5% throughput with lower cpu util and lower latency) by
> > feeding high request latency LBR branches to the compiler on a
> > request-oriented service. We used bpf to read a special request context
> > ID (which is how we associate branches with latency) from a fixed
> > userspace address. Reading from the fixed address is why bpf support is
> > useful.
> >
> > Aside from this particular use case, having LBR data available to bpf
> > progs can be useful to get stack traces out of userspace applications
> > that omit frame pointers.
> >
> > This patch adds support for LBR data to bpf perf progs.
> >
> > Some notes:
> > * We use `__u64 entries[BPF_MAX_LBR_ENTRIES * 3]` instead of
> > `struct perf_branch_entry[BPF_MAX_LBR_ENTRIES]` because checkpatch.pl
> > warns about including a uapi header from another uapi header
> >
> > * We define BPF_MAX_LBR_ENTRIES as 32 (instead of using the value from
> > arch/x86/events/perf_events.h) because including arch specific headers
> > seems wrong and could introduce circular header includes.
> >
> > Signed-off-by: Daniel Xu <dxu@xxxxxxxxx>
> > ---
> > include/uapi/linux/bpf_perf_event.h | 5 ++++
> > kernel/trace/bpf_trace.c | 39 +++++++++++++++++++++++++++++
> > 2 files changed, 44 insertions(+)
> >
> > diff --git a/include/uapi/linux/bpf_perf_event.h b/include/uapi/linux/bpf_perf_event.h
> > index eb1b9d21250c..dc87e3d50390 100644
> > --- a/include/uapi/linux/bpf_perf_event.h
> > +++ b/include/uapi/linux/bpf_perf_event.h
> > @@ -10,10 +10,15 @@
> >
> > #include <asm/bpf_perf_event.h>
> >
> > +#define BPF_MAX_LBR_ENTRIES 32
> > +
> > struct bpf_perf_event_data {
> > bpf_user_pt_regs_t regs;
> > __u64 sample_period;
> > __u64 addr;
> > + __u64 nr_lbr;
> > + /* Cast to struct perf_branch_entry* before using */
> > + __u64 entries[BPF_MAX_LBR_ENTRIES * 3];
> > };
> >
>
>
> I wonder if instead of hard-coding this in bpf_perf_event_data, could
> we achieve this and perhaps even more flexibility by letting users
> access underlying bpf_perf_event_data_kern and use CO-RE to read
> whatever needs to be read from perf_sample_data, perf_event, etc?
> Would that work?
>
>
> > #endif /* _UAPI__LINUX_BPF_PERF_EVENT_H__ */
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index ffc91d4935ac..96ba7995b3d7 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
>
>
> [...]
>

Sorry about the late response. I chatted w/ Andrii last week and spent
some time playing with alternatives. It turns out we can read lbr data
by casting the bpf_perf_event_data to the internal kernel datastructure
and doing some well placed bpf_probe_read's.

Unless someone else thinks this patch would be useful, I will probably
abandon it for now (unless we experience enough pain from doing these
casts). If I did a v2, I would probably add a bpf helper instead of
modifying the ctx to get around the ugly api limitations.

Daniel