Re: [PATCH 2/2] perf/x86/intel: Fix PMI handling for Intel PT

From: Alexander Shishkin
Date: Fri Jun 12 2015 - 05:08:52 EST


Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:

> On Thu, Jun 11, 2015 at 03:13:57PM +0300, Alexander Shishkin wrote:
>> @@ -1426,7 +1426,23 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
>> u64 finish_clock;
>> int ret;
>>
>> - if (!atomic_read(&active_events))
>> + /*
>> + * Intel PT is a separate PMUs, it doesn't increment active_events
>> + * nor does it use resources associated with it, such as DS area.
>> + * However, it still uses the same PMI handler, so here we need
>> + * to make sure to call it *also* if any PT events are present.
>> + *
>> + * BTS events currently increment active_events implicitly through
>> + * x86_reserve_hardware(), where it acts as DS area reference
>> + * counter, so no need to check x86_lbr_exclusive_bts counter here;
>> + * otherwise we'd have to do the same for BTS.
>> + *
>> + * Theoretically, they could be made into separate PMI handlers, but
>> + * that can create additional challenges as PT uses the same PMI status
>> + * register as x86_pmu.
>> + */
>> + if (!atomic_read(&active_events) &&
>> + !atomic_read(&x86_pmu.lbr_exclusive[x86_lbr_exclusive_pt]))
>> return NMI_DONE;
>>
>
> Urgh, sad. That's two cache misses instead of one. Cant we somehow keep
> that a single value to read?

Indeed. One thing here is that active_events actually serves dual
purpose and does reference counting for
{reserve,release}_{ds_buffers,pmc_hardware}(). We can split it in two
and then we can add PT events to active_events as well, so we can make
do with just one atomic_read() here. How about the patch below instead
of the previously posted one?