Re: [PATCH] perf_events: improve x86 event scheduling (v6 incremental)

From: stephane eranian
Date: Mon Jan 25 2010 - 12:12:31 EST

On Fri, Jan 22, 2010 at 9:27 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Thu, 2010-01-21 at 17:39 +0200, Stephane Eranian wrote:
>> @@ -1395,40 +1430,28 @@ void hw_perf_enable(void)
>> Â Â Â Â Â Â Â Â Â* apply assignment obtained either from
>> Â Â Â Â Â Â Â Â Â* hw_perf_group_sched_in() or x86_pmu_enable()
>> Â Â Â Â Â Â Â Â Â*
>> - Â Â Â Â Â Â Â Â* step1: save events moving to new counters
>> - Â Â Â Â Â Â Â Â* step2: reprogram moved events into new counters
>> + Â Â Â Â Â Â Â Â* We either re-enable or re-program and re-enable.
>> + Â Â Â Â Â Â Â Â* All events are disabled by the time we come here.
>> + Â Â Â Â Â Â Â Â* That means their state has been saved already.
>> Â Â Â Â Â Â Â Â Â*/
> I'm not seeing how it is true.

> Suppose a core2 with counter0 active counting a non-restricted event,
> say cpu_cycles. Then we do:
> perf_disable()
> Âhw_perf_disable()
> Â Âintel_pmu_disable_all
everything is disabled globally, yet individual counter0 is not.
But that's enough to stop it.

> ->enable(MEM_LOAD_RETIRED) /* constrained to counter0 */
> Âx86_pmu_enable()
> Â Âcollect_events()
> Â Âx86_schedule_events()
> Â Ân_added = 1
> Â Â/* also slightly confused about this */
> Â Âif (hwc->idx != -1)
> Â Â Âx86_perf_event_set_period()

In x86_pmu_enable(), we have not yet actually assigned the
counter to hwc->idx. This is only accomplished by hw_perf_enable().
Yet, x86_perf_event_set_period() is going to write the MSR.

My understanding is that you never call enable(event) in code
outside of a perf_disable()/perf_enable() section.

> perf_enable()
> Âhw_perf_enable()
> Â Â/* and here we'll assign the new event to counter0
> Â Â * except we never disabled it... */
You will have two events, scheduled, cycles in counter1
and mem_load_retired in counter0. Neither hwc->idx
will match previous state and thus both will be rewritten.

I think the case you are worried about is different. It is the
case where you would move an event to a new counter
without replacing it with a new event. Given that the individual
MSR.en would still be 1 AND that enable_all() enables all
counters (even the ones not actively used), then we would
get a runaway counter so to speak.

It seems a solution would be to call x86_pmu_disable() before
assigning an event to a new counter for all events which are
moving. This is because we cannot assume all events have been
previously disabled individually. Something like

if (!match_prev_assignment(hwc, cpuc, i)) {
if (hwc->idx != -1)
x86_pmu.disable(hwc, hwc->idx);
x86_assign_hw_event(event, cpuc, cpuc->assign[i]);
x86_perf_event_set_period(event, hwc, hwc->idx);

> Â Âintel_pmu_enable_all()
> Â Â Âwrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, intel_ctrl)
> Or am I missing something?
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at