Re: [PATCH 1/3] perf/core: Flush PMU internal buffers for per-CPU events

From: Peter Zijlstra
Date: Mon Nov 09 2020 - 06:04:19 EST


On Mon, Nov 09, 2020 at 10:52:35AM +0100, Peter Zijlstra wrote:
> On Fri, Nov 06, 2020 at 01:29:33PM -0800, kan.liang@xxxxxxxxxxxxxxx wrote:
> > From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
> >
> > Sometimes the PMU internal buffers have to be flushed for per-CPU events
> > during a context switch, e.g., large PEBS. Otherwise, the perf tool may
> > report samples in locations that do not belong to the process where the
> > samples are processed in, because PEBS does not tag samples with PID/TID.
> >
> > The current code only flush the buffers for a per-task event. It doesn't
> > check a per-CPU event.
> >
> > Add a new event state flag, PERF_ATTACH_SCHED_CB, to indicate that the
> > PMU internal buffers have to be flushed for this event during a context
> > switch.
> >
> > Add sched_cb_entry and perf_sched_cb_usages back to track the PMU/cpuctx
> > which is required to be flushed.
> >
> > Only need to invoke the sched_task() for per-CPU events in this patch.
> > The per-task events have been handled in perf_event_context_sched_in/out
> > already.
> >
> > Fixes: 9c964efa4330 ("perf/x86/intel: Drain the PEBS buffer during context switches")
>
> Are you sure? In part this patch looks like a revert of:
>
> 44fae179ce73a26733d9e2d346da4e1a1cb94647
> 556cccad389717d6eb4f5a24b45ff41cad3aaabf

*groan*... I think I might've made a mistake with those two patches. I
assumed the whole cpuctx->task_ctx thing was relevant, it is not.

As per perf_sched_cb_{inc,dec}(struct pmu *), the thing we care about is
that the *PMU* gets a context switch callback, we don't give a crap
about the actual task context. Except that LBR code does, but I'm
thinking that started the whole confusion -- and I'm still not sure it's
actually correct either.

Now,.. how did we end up with the above two patches anyway... /me frobs
around in the inbox... Ah! that daft user RDPMC thing. I wanted to avoid
yet another pmu::method().

Hmm.. and the reason I proposed that change is because we'd end up with
the sched_cb for every context switch now, not just large-pebs and or
lbr crud. And this form avoids the double perf_pmu_disable() and all
that.

Maybe we can frob x86_pmu_enable()...