Re: [PATCH] perf_events: improve x86 event scheduling (v6 incremental)

From: Peter Zijlstra
Date: Mon Jan 25 2010 - 13:00:01 EST

On Mon, 2010-01-25 at 18:48 +0100, stephane eranian wrote:

> >> It seems a solution would be to call x86_pmu_disable() before
> >> assigning an event to a new counter for all events which are
> >> moving. This is because we cannot assume all events have been
> >> previously disabled individually. Something like
> >>
> >> if (!match_prev_assignment(hwc, cpuc, i)) {
> >> if (hwc->idx != -1)
> >> x86_pmu.disable(hwc, hwc->idx);
> >> x86_assign_hw_event(event, cpuc, cpuc->assign[i]);
> >> x86_perf_event_set_period(event, hwc, hwc->idx);
> >> }
> >
> > Yes and no, my worry is not that its not counting, but that we didn't
> > store the actual counter value before over-writing it with the new
> > configuration.
> >
> > As to your suggestion,
> > 1) we would have to do x86_pmu_disable() since that does
> > x86_perf_event_update().
> > 2) I worried about the case where we basically switch two counters,
> > there we cannot do the x86_perf_event_update() in a single pass since
> > programming the first counter will destroy the value of the second.
> >
> > Now possibly the scenario in 2 isn't possible because the event
> > scheduling is stable enough for this never to happen, but I wasn't
> > feeling too sure about that, so I skipped this part for now.
> >
> I think what adds to the complexity here is that there are two distinct
> disable() mechanisms: perf_disable() and x86_pmu.disable(). They
> don't operate the same way. You would think that by calling hw_perf_disable()
> you would stop individual events as well (thus saving their values). That
> means that if you do perf_disable() and then read the count, you will not
> get the up-to-date value in event->count. you need pmu->disable(event)
> to ensure that.

No, a read is actually good enough, it does x86_perf_event_update(), but
we're not guaranteeing that read is present.

So yes, perf_disable() is basically a local_irq_disable() but for perf
events, all it has to do is ensure there's no concurrency. ->disable()
will really tear the configuration down.

Either ->disable() or ->read() will end up calling
x86_perf_event_update() which is needed to read the actual hw counter
value and propagate it into event storage.

> So my understanding is that perf_disable() is meant for a temporary stop,
> thus no need to save the count.
> As for 2, I believe this can happen if you add 2 new events which have more
> restrictions. For instance on Core, you were measuring cycles, inst in generic
> counters, then you add 2 events which can only be measured on generic counters.
> That will cause cycles, inst to be moved to fixed counters.
> So we have to modify hw_perf_enable() to first disable all events
> which are moving,
> then reprogram them. I suspect it may be possible to optimize this if
> we detect that
> those events had already been stopped individually (as opposed to
> perf_disable()), i.e.,
> already had their counts saved.

Right, I see no fundamentally impossible things at all, we just need to
be careful here.

Anyway, I poked at the stack I've got now and it seems to hold up when I
poke at it with various combinations of constraint events, so I'll push
that off to Ingo and then we can go from there.

Thanks for working on this!

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at