Re: [PATCH v2 1/3] perf/core: Add a tracepoint for perf sampling

From: Brendan Gregg
Date: Fri Aug 05 2016 - 13:22:44 EST

Next message: Vincent Brillault: "Re: kernel/printk/printk.c: Invalid access when buffer wraps around?"
Previous message: Diana Madalina Craciun: "Re: [PATCH v12 0/8] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes"
In reply to: Peter Zijlstra: "Re: [PATCH v2 1/3] perf/core: Add a tracepoint for perf sampling"
Next in thread: Alexei Starovoitov: "Re: [PATCH v2 1/3] perf/core: Add a tracepoint for perf sampling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Aug 5, 2016 at 3:52 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Thu, Aug 04, 2016 at 10:24:06PM -0700, Alexei Starovoitov wrote:
>> tracepoints are actually zero overhead already via static-key mechanism.
>> I don't think Peter's objection for the tracepoint was due to overhead.
>
> Almost 0, they still have some I$ footprint, but yes. My main worry is
> that we can feed tracepoints into perf, so having tracepoints in perf is
> tricky.

Coincidentally I$ footprint was my most recent use case for needing
this: I have an I$ busting workload, and wanting to profile
instructions at a very high rate to get a breakdown of I$ population.
(Normally I'd use I$ miss overflow, but none of our Linux systems have
PMCs: cloud.)

> I also don't much like this tracepoint being specific to the hrtimer
> bits, I can well imagine people wanting to do the same thing for
> hardware based samples or whatnot.

Sure, which is why I thought we'd have two in a perf category. I'm all
for PMCs events, even though we can't currently use them!

>
>> > The perf:perf_hrtimer probe point is also reading state mid-way
>> > through a function, so it's not quite as simple as wrapping the
>> > function pointer. I do like that idea, though, but for things like
>> > struct file_operations.
>
> So what additional state to you need?

I was pulling in regs after get_irq_regs(), struct perf_event *event
after it's populated. Not that hard to duplicate. Just noting it
didn't map directly to the function entry.

I wanted perf_event just for event->ctx->task->pid, so that a BPF
program can differentiate between it's samples and other concurrent
sessions.

(I was thinking of changing my patch to expose pid_t instead of
perf_event, since I was noticing it didn't add many instructions.)

[...]
>> instead of adding a tracepoint to perf_swevent_hrtimer we can replace
>> overflow_handler for that particular event with some form of bpf wrapper.
>> (probably new bpf program type). Then not only periodic events
>> will be triggering bpf prog, but pmu events as well.
>
> Exactly.

Although the timer use case is a bit different, and is via
hwc->hrtimer.function = perf_swevent_hrtimer.

[...]
>> The question is what to pass into the
>> program to make the most use out of it. 'struct pt_regs' is done deal.
>> but perf_sample_data we cannot pass as-is, since it's kernel internal.
>
> Urgh, does it have to be stable API? Can't we simply rely on the kernel
> headers to provide the right structure definition?

For timer it can be: struct pt_regs, pid_t.

So that would restrict your BPF program to one timer, since if you had
two (from one pid) you couldn't tell them apart. But I'm not sure of a
use case for two in-kernel timers. If there were, we could also add
struct perf_event_attr, which has enough info to tell things apart,
and is already exposed to user space.

I haven't looked into the PMU arguments, but perhaps that could be:
struct pt_regs, pid_t, struct perf_event_attr.

Thanks,

Brendan

Next message: Vincent Brillault: "Re: kernel/printk/printk.c: Invalid access when buffer wraps around?"
Previous message: Diana Madalina Craciun: "Re: [PATCH v12 0/8] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes"
In reply to: Peter Zijlstra: "Re: [PATCH v2 1/3] perf/core: Add a tracepoint for perf sampling"
Next in thread: Alexei Starovoitov: "Re: [PATCH v2 1/3] perf/core: Add a tracepoint for perf sampling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]