Re: [PATCH 0/2] x86/intel_rdt and perf/x86: Fix lack of coordination with perf

From: Peter Zijlstra
Date: Wed Aug 08 2018 - 03:41:20 EST


On Tue, Aug 07, 2018 at 10:44:44PM -0700, Reinette Chatre wrote:
> Hi Tony,
>
> On 8/7/2018 6:28 PM, Luck, Tony wrote:
> > Would it help to call routines to read the "before" values of the counter
> > twice. The first time to preload the cache with anything needed to execute
> > the perf code path.

Indeed, that is the 'common' pattern for this.


> First, reading data using perf_event_read_local() called twice.
> When testing as follows:
> /* create perf events */
> /* disable irq */
> /* disable hw prefetchers */
> /* init local vars */
> /* read before data twice as follows: */
> perf_event_read_local(l2_hit_event, &l2_hits_before, NULL, NULL);
> perf_event_read_local(l2_miss_event, &l2_miss_before, NULL, NULL);
> perf_event_read_local(l2_hit_event, &l2_hits_before, NULL, NULL);
> perf_event_read_local(l2_miss_event, &l2_miss_before, NULL, NULL);
> /* read through pseudo-locked memory */
> perf_event_read_local(l2_hit_event, &l2_hits_after, NULL, NULL);
> perf_event_read_local(l2_miss_event, &l2_miss_after, NULL, NULL);
> /* re enable hw prefetchers */
> /* enable irq */
> /* write data to tracepoint */
>
> With the above I am not able to obtain accurate data:
> pseudo_lock_mea-354 [002] .... 63.045734: pseudo_lock_l2: hits=4103
> miss=6

So _why_ doesn't this work? As said by Tony, that first call should
prime the caches, so the second and third calls should not generate any
misses.

They might cause extra hits though, but that should be a constant
amount is is also measureable with a no-op loop and can easily be
subtracted.