Re: [RFC 2/2] perf: Sharing PMU counters across compatible events

From: Peter Zijlstra
Date: Mon May 28 2018 - 08:39:34 EST


On Fri, May 04, 2018 at 04:11:02PM -0700, Song Liu wrote:
> On the critical paths, perf_events are added to/removed from the
> active_dup list of the perf_event. The first event added to the list
> will be the master event, and the only event that runs pmu->add().
> Later events will all refer to this master for read().
>
> cpuctx -> perf_event_dup -> master
> ^ -> active_dup (list)
> | ^ ^
> perf_event /| ----------/ |
> | |
> perf_event / -------------/
>

> +static void add_event_to_dup_event_list(struct perf_event *event,
> + struct perf_cpu_context *cpuctx)
> +{
> + int i;
> +
> + for (i = 0; i < cpuctx->dup_event_count; ++i)
> + if (memcmp(&event->attr,
> + &cpuctx->dup_event_list[i].first->attr,
> + sizeof(event->attr)) == 0) {
> + event->dup_id = i;
> + return;
> + }

(style nit: this needs {})

So we merge events when the attr's are an exact match; which includes
sampling and all those fancy things, right?

I think this scheme causes phase shifts in the samples when we combine
two otherwise identical events. Because while they have the same
sampling interval, they might not have the same effective runtime and
thus have a different 'remainder' for the current sample interval.

This could add up to a significant sample skew for unlucky
circumstances. On average I think it works out, but if you're always
landing on a shorter interval, the effective sample rate can go up
significantly.

> + i = cpuctx->dup_event_count++;
> + cpuctx->dup_event_list[i].first = event;
> + cpuctx->dup_event_list[i].master = NULL;
> + INIT_LIST_HEAD(&cpuctx->dup_event_list[i].active_dup);
> + event->dup_id = i;
> + INIT_LIST_HEAD(&event->dup_sibling_entry);
> +}