RE: [PATCH] perf/x86: fix event counter update issue

From: Liang, Kan
Date: Mon Nov 28 2016 - 15:23:35 EST




> >> > Case B: In PMI. new > prev. delta is positive.
> >> > That's the failure case of Test 3.
> >> > The PMI is triggered by overflow. But there is latency between
> >> > overflow and PMI handler. So new has small amount.
> >> > Current calculation lose the max count value.
> >> >
> >> > Case C: In PMI. new < prev. delta is negative.
> >> > The PMU counter may be start from a big value. E.g. the fixed period
> >> > is small.
> >> > It can be fixed by adding the max count value.
> >> >
> >> I am not sure I understand why PMI should matter here. What matters
> >> is prev vs. current and the wrap-around.
> >> Please explain.
> >> Thanks.
> >
> > Right, PMI shouldn't matter here.
> > We should add max count value if delta is negative, no matter if itâs in
> PMI.
> >
> That sounds right, but then why do you have that boolean (bool pmi) in
> your patch?

For case B. It needs to add max count value if new > prev in PMI.

Thanks,
Kan

>
> > To fix this issue, I once had a list which include all the possible cases.
> > "non-PMI, new < prev, delta is negative" is one of them. But it rarely
> happen.
> > So I remove it, but forget to modify the description of Case C.
> >
> > Thanks,
> > Kan
> >
> >>
> >> >
> >> > There is still a case which delta could be wrong.
> >> > The case is that event update just happens in PMI and overflow gap.
> >> > It's not in PMI, new > prev, and delta is positive. Current
> >> > calculation may lose the max count value.
> >> > But this case rarely happens. So the rare case doesn't handle here.
> >> >
> >> > Reported-by: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx>
> >> > Signed-off-by: Kan Liang <kan.liang@xxxxxxxxx>
> >> > Tested-by: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx>
> >> > ---
> >> > arch/x86/events/core.c | 23 +++++++++++++++++++----
> >> > arch/x86/events/intel/core.c | 4 ++--
> >> > arch/x86/events/intel/p4.c | 2 +-
> >> > arch/x86/events/perf_event.h | 2 +-
> >> > 4 files changed, 23 insertions(+), 8 deletions(-)
> >> >
> >> > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index
> >> > 6c3b0ef..c351ac4 100644
> >> > --- a/arch/x86/events/core.c
> >> > +++ b/arch/x86/events/core.c
> >> > @@ -63,7 +63,7 @@ u64 __read_mostly hw_cache_extra_regs
> >> > * Can only be executed on the CPU where the event is active.
> >> > * Returns the delta events processed.
> >> > */
> >> > -u64 x86_perf_event_update(struct perf_event *event)
> >> > +u64 x86_perf_event_update(struct perf_event *event, bool pmi)
> >> > {
> >> > struct hw_perf_event *hwc = &event->hw;
> >> > int shift = 64 - x86_pmu.cntval_bits; @@ -100,6 +100,21 @@
> >> > u64 x86_perf_event_update(struct perf_event *event)
> >> > delta = (new_raw_count << shift) - (prev_raw_count << shift);
> >> > delta >>= shift;
> >> >
> >> > + /*
> >> > + * Correct delta for special cases caused by the late PMI handle.
> >> > + * Case1: PMU counter may be start from a big value.
> >> > + * In PMI, new < prev. delta is negative.
> >> > + * Case2: new is big enough which impact the sign flag.
> >> > + * The delta will be negative after arithmetic right shift.
> >> > + * Case3: In PMI, new > prev.
> >> > + * The new - prev lose the max count value.
> >> > + *
> >> > + * There may be event update in PMI and overflow gap,
> >> > + * but it rarely happen. The rare case doesn't handle here.
> >> > + */
> >> > + if (((delta > 0) && pmi) || (delta < 0))
> >> > + delta += x86_pmu.cntval_mask + 1;
> >> > +
> >> > local64_add(delta, &event->count);
> >> > local64_sub(delta, &hwc->period_left);
> >> >
> >> > @@ -1342,7 +1357,7 @@ void x86_pmu_stop(struct perf_event *event,
> >> int flags)
> >> > * Drain the remaining delta count out of a event
> >> > * that we are disabling:
> >> > */
> >> > - x86_perf_event_update(event);
> >> > + x86_perf_event_update(event, (flags == 0));
> >> > hwc->state |= PERF_HES_UPTODATE;
> >> > }
> >> > }
> >> > @@ -1446,7 +1461,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
> >> >
> >> > event = cpuc->events[idx];
> >> >
> >> > - val = x86_perf_event_update(event);
> >> > + val = x86_perf_event_update(event, true);
> >> > if (val & (1ULL << (x86_pmu.cntval_bits - 1)))
> >> > continue;
> >> >
> >> > @@ -1867,7 +1882,7 @@ early_initcall(init_hw_perf_events);
> >> >
> >> > static inline void x86_pmu_read(struct perf_event *event) {
> >> > - x86_perf_event_update(event);
> >> > + x86_perf_event_update(event, false);
> >> > }
> >> >
> >> > /*
> >> > diff --git a/arch/x86/events/intel/core.c
> >> > b/arch/x86/events/intel/core.c index a74a2db..69d65e6 100644
> >> > --- a/arch/x86/events/intel/core.c
> >> > +++ b/arch/x86/events/intel/core.c
> >> > @@ -1830,7 +1830,7 @@ static void intel_pmu_nhm_workaround(void)
> >> > for (i = 0; i < 4; i++) {
> >> > event = cpuc->events[i];
> >> > if (event)
> >> > - x86_perf_event_update(event);
> >> > + x86_perf_event_update(event, false);
> >> > }
> >> >
> >> > for (i = 0; i < 4; i++) {
> >> > @@ -2002,7 +2002,7 @@ static void intel_pmu_add_event(struct
> >> perf_event *event)
> >> > */
> >> > int intel_pmu_save_and_restart(struct perf_event *event) {
> >> > - x86_perf_event_update(event);
> >> > + x86_perf_event_update(event, true);
> >> > /*
> >> > * For a checkpointed counter always reset back to 0. This
> >> > * avoids a situation where the counter overflows, aborts
> >> > the diff --git a/arch/x86/events/intel/p4.c
> >> > b/arch/x86/events/intel/p4.c index eb05335..8117de8 100644
> >> > --- a/arch/x86/events/intel/p4.c
> >> > +++ b/arch/x86/events/intel/p4.c
> >> > @@ -1024,7 +1024,7 @@ static int p4_pmu_handle_irq(struct pt_regs
> >> *regs)
> >> > /* it might be unflagged overflow */
> >> > overflow = p4_pmu_clear_cccr_ovf(hwc);
> >> >
> >> > - val = x86_perf_event_update(event);
> >> > + val = x86_perf_event_update(event, true);
> >> > if (!overflow && (val & (1ULL << (x86_pmu.cntval_bits - 1))))
> >> > continue;
> >> >
> >> > diff --git a/arch/x86/events/perf_event.h
> >> > b/arch/x86/events/perf_event.h index c6b25ac..09c9db2 100644
> >> > --- a/arch/x86/events/perf_event.h
> >> > +++ b/arch/x86/events/perf_event.h
> >> > @@ -712,7 +712,7 @@ extern u64 __read_mostly
> hw_cache_extra_regs
> >> > [PERF_COUNT_HW_CACHE_OP_MAX]
> >> > [PERF_COUNT_HW_CACHE_RESULT_MAX];
> >> >
> >> > -u64 x86_perf_event_update(struct perf_event *event);
> >> > +u64 x86_perf_event_update(struct perf_event *event, bool pmi);
> >> >
> >> > static inline unsigned int x86_pmu_config_addr(int index) {
> >> > --
> >> > 2.4.3
> >> >