Re: [PATCH] perf_events: improve x86 event scheduling (v5)

From: Frederic Weisbecker
Date: Tue Jan 19 2010 - 10:40:31 EST

On Tue, Jan 19, 2010 at 01:22:53PM +0100, Peter Zijlstra wrote:
> On Mon, 2010-01-18 at 18:29 +0100, Frederic Weisbecker wrote:
> > It has constraints that only need to be checked when we register
> > the event. It has also constraint on enable time but nothing
> > tricky that requires an overwritten group scheduling.
> The fact that ->enable() can fail makes it a hardware counter. Software
> counters cannot fail enable.
> Having multiple groups of failable events (multiple hardware pmus) can
> go wrong with the current core in interesting ways, look for example at
> __perf_event_sched_in():
> It does:
> int can_add_hw = 1;
> ...
> list_for_each_entry(event, &ctx->flexible_groups, group_entry) {
> /* Ignore events in OFF or ERROR state */
> if (event->state <= PERF_EVENT_STATE_OFF)
> continue;
> /*
> * Listen to the 'cpu' scheduling filter constraint
> * of events:
> */
> if (event->cpu != -1 && event->cpu != cpu)
> continue;
> if (group_can_go_on(event, cpuctx, can_add_hw))
> if (group_sched_in(event, cpuctx, ctx, cpu))
> can_add_hw = 0;
> }
> Now, if you look at that logic you'll see that it assumes there's one hw
> device since it only has one can_add_hw state. So if your hw_breakpoint
> pmu starts to fail we'll also stop adding counters to the cpu pmu (for
> lack of a better name) and vs.


> This might be fixable by using per-cpu struct pmu variables.

Yeah or just a mask instead of can_add_hw that can testify
a given pmu type couldn't be scheduled anymore?

We can extend struct perf_event:group_flags:

enum perf_group_flag {

Looks easy to maintain and to use for quick bitwise check
on flexible groups scheduling.

> However I'm afraid its far to late to push any of that into .33, which
> means .33 will have rather funny behaviour once the breakpoints start
> getting used.

No. Because for now it is not supposed to happen that a breakpoint
can't be scheduled.

We have constraints that check whether a pinned breakpoint will
always be able to go in, on registration. Similarly we have
constraints for flexible ones, to ensure it's possible for these
to be scheduled. But these are broken because of the non-strict
ordering between pinned and non-pinned events.

Anyway, the point is that for now we treat flexible breakpoints
as if these were pinned, so a breakpoint is not supposed to
be unschedulable, ever. So .33 is not broken.

But we need to fix what you described for .34, because once we
get a strict ordering between pinned and flexible, I'm going
to reactivate the breakpoint constraints for flexibles.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at