Re: [PATCH V4 1/5] perf/x86: Extend event update interface

From: Peter Zijlstra
Date: Thu Aug 01 2024 - 10:03:58 EST


On Wed, Jul 31, 2024 at 07:38:31AM -0700, kan.liang@xxxxxxxxxxxxxxx wrote:
> From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
>
> The current event update interface directly reads the values from the
> counter, but the values may not be the accurate ones users require. For
> example, the sample read feature wants the counter value of the member
> events when the leader event is overflow. But with the current
> implementation, the read (event update) actually happens in the NMI
> handler. There may be a small gap between the overflow and the NMI
> handler.

This...

> The new Intel PEBS counters snapshotting feature can provide
> the accurate counter value in the overflow. The event update interface
> has to be updated to apply the given accurate values.
>
> Pass the accurate values via the event update interface. If the value is
> not available, still directly read the counter.

> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 12f2a0c14d33..07a56bf71160 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -112,7 +112,7 @@ u64 __read_mostly hw_cache_extra_regs
> * Can only be executed on the CPU where the event is active.
> * Returns the delta events processed.
> */
> -u64 x86_perf_event_update(struct perf_event *event)
> +u64 x86_perf_event_update(struct perf_event *event, u64 *val)
> {
> struct hw_perf_event *hwc = &event->hw;
> int shift = 64 - x86_pmu.cntval_bits;
> @@ -131,7 +131,10 @@ u64 x86_perf_event_update(struct perf_event *event)
> */
> prev_raw_count = local64_read(&hwc->prev_count);
> do {
> - rdpmcl(hwc->event_base_rdpmc, new_raw_count);
> + if (!val)
> + rdpmcl(hwc->event_base_rdpmc, new_raw_count);
> + else
> + new_raw_count = *val;
> } while (!local64_try_cmpxchg(&hwc->prev_count,
> &prev_raw_count, new_raw_count));
>

Does that mean the following is possible?

Two counters: C0 and C1, where C0 is a PEBS counter that also samples
C1.

C0: overflow-with-PEBS-assist -> PEBS entry with counter value A
(DS buffer threshold not reached)

C1: overflow -> PMI -> x86_perf_event_update(C1, NULL)
rdpmcl reads value 'A+d', and sets prev_raw_count

C0: more assists, hit DS threshold -> PMI
PEBS processing does x86_perf_event_update(C1, A)
and sets prev_raw_count *backwards*

How is that sane?