Re: [RFC 2/3] perf/x86: Control RDPMC access from .enable() hook
From: Rob Herring
Date: Mon Aug 30 2021 - 17:40:50 EST
On Mon, Aug 30, 2021 at 3:21 PM Vince Weaver <vincent.weaver@xxxxxxxxx> wrote:
>
> On Mon, 30 Aug 2021, Peter Zijlstra wrote:
>
> > There's just not much we can do to validate the usage, fundamentally at
> > RDPMC time we're not running any kernel code, so we can't validate the
> > conditions under which we're called.
> >
> > I suppose one way would be to create a mode where RDPMC is disabled but
> > emulated -- which completely voids the reason for using RDPMC in the
> > first place (performance), but would allow us to validate the usage.
> >
> > Fundamentally, we must call RDPMC only for events that are currently
> > actuve on *this* CPU. Currently we rely on userspace to DTRT and if it
> > doesn't we have no way of knowing and it gets to keep the pieces.
>
> yes, though it would be nice for cases where things will never work (such
> as process-attach? I think even if pinned to the same CPU that won't
> work?) Maybe somehow the mmap page could be set in a way to indicate we
> should fall back to the syscall. Maybe set pc->index to an invalid value
> so we can use the existing syscall fallback code.
>
> We could force every userspace program to know allthe unsupoorted cases
> but it seems like it could be easier and less failure-prone to centralize
> this in the kernel.
>
> I was looking into maybe creating a patch for this but the magic perf
> mmap page implementation is complex enough that I'm not sure I'm qualified
> to mess with it.
There's now an implementation in libperf[1]. perf_evsel__read() will
use it[2] and fallback to read() call if necessary (but will still
happily give you wrong values if reading on the wrong CPU).
Rob
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/mmap.c#n302
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/evsel.c#n305