Re: perf: rdpmc bug when viewing all procs on remote cpu

From: Peter Zijlstra
Date: Fri Jan 18 2019 - 15:11:02 EST


On Fri, Jan 18, 2019 at 12:24:20PM -0500, Vince Weaver wrote:
> On Fri, 18 Jan 2019, Peter Zijlstra wrote:
> >
> > You can actually use rdpmc when you attach to a CPU, but you have to
> > ensure that the userspace component is guaranteed to run on that very
> > CPU (sched_setaffinity(2) comes to mind).
>
> unfortunately the HPC people using PAPI would probably be annoyed if we
> started binding their threads to cores out from under them.

Quite.. :-)

> > The best we could possibly do is put the (target, not current) cpu
> > number in the mmap page; but userspace should already know this, for it
> > created the event and therefore knows this already.
>
> one other thing the kernel would do is just disable rdpmc (setting index
> to 0) in the case where the original perf_event_open() cpu paramater!=0
>
> though that would stop the case where we were on the same CPU from
> working.

Indeed.

> The issue is currently if you're not careful the rdpmc() interface will
> sometimes return plausible (but wrong) results for a cross-CPU rdpmc()
> call, even if you are properly falling back to read() on ->index being 0.
> It's a bit surprising and it looks like it will take a decent amount of
> userspace code to work around the issue, which cuts into the low-overhead
> nature of rdpmc.
>
> If the answer is simply this is the way the kernel is going to do it,
> that's fine, I just have to add workarounds to PAPI and then get the
> perf_even_open() manpage updated to make sure people are aware of the
> issue.

I'm not sure there really is anything the kernel can do to help here...
One thing you could look at is using rseq together with adding a CPU
number to the userspace descriptor, and if rseq gives you a matching CPU
number use rdpmc, otherwise, or on rseq abort, use read().