Re: [PATCH v12] perf: Sharing PMU counters across compatible events

From: Peter Zijlstra
Date: Tue Apr 21 2020 - 03:19:03 EST


On Tue, Apr 21, 2020 at 01:13:32AM +0000, Song Liu wrote:
> >> static inline u64 perf_event_count(struct perf_event *event)
> >> {
> >> - return local64_read(&event->count) + atomic64_read(&event->child_count);
> >> + u64 count;
> >> +
> >> + if (likely(event->dup_master != event))
> >> + count = local64_read(&event->count);
> >> + else
> >> + count = local64_read(&event->master_count);
> >> +
> >> + return count + atomic64_read(&event->child_count);
> >> }
> >
> > So lsat time I said something about SMP ordering here. Where did that
> > go?
>
> I guess we will need something with EVENT_TOMBSTONE? That's not clean, I guess.
>
> >
> > Specifically, without ordering it is possible to observe dup_master and
> > dup_count out of order. So while we might then see dup_master, we might
> > then also see an old dup_count, which would give 'interesting' effects.
> >
> > Granted, adding memory barriers all over this will suck :/
>
> Can we somehow guarantee dup_master and the counts sits in the same
> cache line? In that way, we can get rid of most of these barriers.

I'm afraid memory ordering doesn't quite work that way. It is also very
much outside every architectural guarantee ever written.

In specific this does not consider store buffers and snoops.