Re: [PATCH 2/6] tracing/profile: Add filter support

From: Frederic Weisbecker
Date: Tue Sep 08 2009 - 08:34:01 EST


On Tue, Sep 08, 2009 at 10:35:45AM +0200, Peter Zijlstra wrote:
> On Tue, 2009-09-08 at 04:01 +0200, Frederic Weisbecker wrote:
> > You may need to get the current perf context that can
> > be found in current->perf_counter_ctxp and then iterate
> > through the counter_list of this ctx to find the current counter
> > attached to this tracepoint (using the event id).
> >
> > What is not nice is that we need to iterate in O(n), n beeing the
> > number of tracepoint counters attached to the current counter
> > context.
> >
> > So to avoid the following costly sequence in the tracing fastpath:
> >
> > - deref ctx->current->perf_counter_ctxp
> > - list every ctx->counter_list
> > - find the counter that matches
> > - deref counter->filter and test...
> >
> > You could keep the profile_filter field (and profile_filter_active)
> > in struct ftrace_event_call but allocate them per cpu and
> > write these fields for a given event each time we enter/exit a
> > counter context that has a counter that uses this given event.
>
> How would that work when you have two counters of the same type in one
> context with different filter expressions?


Oh so we can do that? That's what I wondered about.



> > That's something we could do by using a struct pmu specific for
> > tracepoints. More precisely with enable/disable callbacks that would do
> > specific things and then relay on the perf_ops_generic pmu
> > callbacks.
> >
> > the struct pmu::enable()/disable() callbacks are functions that are called
> > each time we schedule in/out a task group that has a counter that
> > uses the given pmu.
> > Ie: they are called each time we schedule in/out a counter.
> >
> > So you have a struct ftrace_event_call. This event can be used in
> > several different counters instance at the same time. But in a given cpu,
> > only one of these counters can be currently in use.
>
> Not so, you can have as many counters as you want on any one particular
> cpu. There is nothing that stops:
>
> perf record -e timer:hrtimer_start -e timer:hrtimer_start -e
> timer:hrtimer_start ...
>
> from working, now add a different filter to each of those counter and
> enjoy ;-)
>


Then may be we can do that here:

static void perf_swcounter_ctx_event(struct perf_counter_context *ctx,
enum perf_type_id type,
u32 event, u64 nr, int nmi,
struct perf_sample_data *data)
{
list_for_each_entry_rcu(counter, &ctx->event_list, event_entry) {
if (perf_swcounter_match(counter, type, event, data->regs))
perf_swcounter_add(counter, nr, nmi, data);
}
}


If we have two instances of a same counter id in the ctx, the sample
event will be added twice there.

What we need is to apply the filter there because we are "counter instance"
aware at this stage.

We can do our filter check inside perf_swcounter_match(). We just
need to have the filter and the ftrace event call as
struct hw_perf_counter fields and call filter_match_preds() from
perf_swcounter_match() (if the previous match tests have successed).


> I've been thinking of replacing that linear list with a better lookup,
> like maybe an RB-tree or hash table, because we hit that silly O(n) loop
> on every software event.


Yeah that indeed is also a problem :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/