Re: [PATCH] perf_events: improve x86 event scheduling (v5)

From: Peter Zijlstra
Date: Tue Jan 19 2010 - 08:25:30 EST


On Tue, 2010-01-19 at 13:22 +0100, Peter Zijlstra wrote:
> On Mon, 2010-01-18 at 18:29 +0100, Frederic Weisbecker wrote:
>
> > It has constraints that only need to be checked when we register
> > the event. It has also constraint on enable time but nothing
> > tricky that requires an overwritten group scheduling.
>
> The fact that ->enable() can fail makes it a hardware counter. Software
> counters cannot fail enable.
>
> Having multiple groups of failable events (multiple hardware pmus) can
> go wrong with the current core in interesting ways, look for example at
> __perf_event_sched_in():
>
> It does:
>
> int can_add_hw = 1;
>
> ...
>
> list_for_each_entry(event, &ctx->flexible_groups, group_entry) {
> /* Ignore events in OFF or ERROR state */
> if (event->state <= PERF_EVENT_STATE_OFF)
> continue;
> /*
> * Listen to the 'cpu' scheduling filter constraint
> * of events:
> */
> if (event->cpu != -1 && event->cpu != cpu)
> continue;
>
> if (group_can_go_on(event, cpuctx, can_add_hw))
> if (group_sched_in(event, cpuctx, ctx, cpu))
> can_add_hw = 0;
> }
>
> Now, if you look at that logic you'll see that it assumes there's one hw
> device since it only has one can_add_hw state. So if your hw_breakpoint
> pmu starts to fail we'll also stop adding counters to the cpu pmu (for
> lack of a better name) and vs.
>
> This might be fixable by using per-cpu struct pmu variables.
>
> I'm going to try and move all the weak hw_perf_* functions into struct
> pmu and create a notifier like callchain for them so we can have proper
> per pmu state, and then use that to fix these things up.
>
> However I'm afraid its far to late to push any of that into .33, which
> means .33 will have rather funny behaviour once the breakpoints start
> getting used.

Hrmph, so I read some of that hw_breakpoint stuff, and now I'm sorta
confused, it looks like ->enable should never fail, but that means you
cannot overcommit breakpoints, which doesn't fit the perf model nicely.

Also, I see you set an ->unthrottle, but then don't implement it, but
comment it as todo, which is strange because that implies its broken. If
there's an ->unthrottle method it will throttle, so if its todo, the
safest thing is to not set it.

/me mutters something and goes look at something else for a while.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/