Re: [PATCH] perf_events: improve x86 event scheduling (v5)

From: Stephane Eranian
Date: Mon Jan 18 2010 - 09:13:06 EST

On Mon, Jan 18, 2010 at 2:54 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Mon, 2010-01-18 at 14:43 +0100, Frederic Weisbecker wrote:
>> Shouldn't we actually use the core based pmu->enable(),disable()
>> model called from kernel/perf_event.c:event_sched_in(),
>> like every other events, where we can fill up the queue of hardware
>> events to be scheduled, and then call a hw_check_constraints()
>> when we finish a group scheduling?
> Well the thing that makes hw_perf_group_sched_in() useful is that you
> can add a bunch of events and not have to reschedule for each one, but
> instead do a single schedule pass.
That's right.

> That said you do have a point, maybe we can express this particular
> thing differently.. maybe a pre and post group call like:
> Âvoid hw_perf_group_sched_in_begin(struct pmu *pmu)
> Âint Âhw_perf_group_sched_in_end(struct pmu *pmu)
The issue with hw_perf_group_sched_in() is that because we do not know
when we are done scheduling, we have to defer actual activation until
hw_perf_enable(). But we have to still mark the events as ACTIVE,
otherwise things go wrong in the generic layer and for non-PMU events.
That leads to partial duplication of event_sched_in()/event_sched_out()
in the PMU specific layer.

As Frederic pointed out, the more natural way would be to simply rely
on event_sched_in()/event_sched_out() and the rollback logic and just
drop hw_perf_group_sched_in() which is there as an optimization and
not for correctness. Scheduling can be done incrementally from the
event_sched_in() function.

> That way we know we need to track more state for rollback and can give
> the pmu implementation leeway to delay scheduling/availablility tests.
Rollback would still be handled by the generic code, wouldn't it?

> Paul, would that work for you too?
> Then there's still the question of having events of multiple hw pmus in
> a single group, I'd be perfectly fine with saying that's not allowed,
> what to others think?
I have seen requests for measuring both core and uncore PMU events
together for instance. It all depends on how uncore PMU will be managed.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at