Re: [PATCH V5 4/4] perf/x86/intel: Support PEBS counters snapshotting

From: Peter Zijlstra
Date: Wed Dec 18 2024 - 03:24:18 EST


On Tue, Dec 17, 2024 at 12:45:56PM -0500, Liang, Kan wrote:
>
>
> On 2024-12-17 8:37 a.m., Peter Zijlstra wrote:
> > On Mon, Dec 16, 2024 at 12:45:05PM -0800, kan.liang@xxxxxxxxxxxxxxx wrote:
> >
> >> @@ -2049,6 +2102,40 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
> >> }
> >> }
> >>
> >> + if (format_group & (PEBS_DATACFG_CNTR | PEBS_DATACFG_METRICS)) {
> >> + struct pebs_cntr_header *cntr = next_record;
> >> + int bit;
> >> +
> >> + next_record += sizeof(struct pebs_cntr_header);
> >> +
> >> + for_each_set_bit(bit, (unsigned long *)&cntr->cntr, INTEL_PMC_MAX_GENERIC) {
> >> + x86_perf_event_update(cpuc->events[bit], (u64 *)next_record);
> >> + next_record += sizeof(u64);
> >> + }
> >
> > I still don't much like any of this -- the next_record value might be in
> > the past relative to what is already in the event.
> >
>
> I forgot to handle the read case. With the below patch, the next_record
> value should not contain a previous value, unless there is a HW bug.
> Because no matter it's read/pmi/context switch, perf always stop the
> PMU, drains the buffer, and the value is always from the PEBS record.

Consider this 3 counter case:

A is our regular counter
B is a PEBS event which reads A
C is a PEBS event

For convencience, let A count the lines in our example (anything
incrementing at a faster rate will get the same result after all).

Then, afaict, the following can happen:

B-assist A=1
C A=2
B-assist A=3
A-overflow-PMI A=4
C-assist-PMI (DS buffer) A=5

So the A-PMI will update A->count to 4.
Then the C-PMI will process the DS buffer, which will:

- adjust A->count to 1
- adjust A->count to 3

Leaving us, in the end, with A going backwards.

How is that not a problem?

IIRC I've raised this point before, and your Changelog does not mention
anything about this issue at all :-(