RE: [PATCH v2] drivers/perf: Enable PID_IN_CONTEXTIDR with SPE

From: Al Grant
Date: Thu Jan 07 2021 - 13:02:37 EST


> From: Mark Rutland <mark.rutland@xxxxxxx>
> Sent: 06 January 2021 10:24
> To: James Clark <James.Clark@xxxxxxx>
> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-
> perf-users@xxxxxxxxxxxxxxx; will@xxxxxxxxxx; leo.yan@xxxxxxxxxx; Al Grant
> <Al.Grant@xxxxxxx>; John Garry <john.garry@xxxxxxxxxx>; Suzuki Poulose
> <Suzuki.Poulose@xxxxxxx>; Mathieu Poirier <mathieu.poirier@xxxxxxxxxx>;
> Catalin Marinas <Catalin.Marinas@xxxxxxx>
> Subject: Re: [PATCH v2] drivers/perf: Enable PID_IN_CONTEXTIDR with SPE
>
> On Mon, Dec 14, 2020 at 10:45:02AM +0200, James Clark wrote:
> > Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled.
> > This flag is required to get PID data in the SPE trace. Without it the
> > perf tool will report 0 for PID which isn't very useful, especially
> > when doing system wide profiling or profiling applications that fork.
> >
> > There is a small performance overhead when enabling PID_IN_CONTEXTIDR,
> > but SPE itself is optional and not enabled by default so the impact is
> > minimised.
> >
> > Cc: Will Deacon <will@xxxxxxxxxx>
> > Cc: Mark Rutland <mark.rutland@xxxxxxx>
> > Cc: Al Grant <al.grant@xxxxxxx>
> > Cc: Leo Yan <leo.yan@xxxxxxxxxx>
> > Cc: John Garry <john.garry@xxxxxxxxxx>
> > Cc: Suzuki K Poulose <suzuki.poulose@xxxxxxx>
> > Cc: Mathieu Poirier <mathieu.poirier@xxxxxxxxxx>
> > Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> > Signed-off-by: James Clark <james.clark@xxxxxxx>
> > ---
> > arch/arm64/Kconfig.debug | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug index
> > 265c4461031f..b030bb21a0bb 100644
> > --- a/arch/arm64/Kconfig.debug
> > +++ b/arch/arm64/Kconfig.debug
> > @@ -2,6 +2,7 @@
> >
> > config PID_IN_CONTEXTIDR
> > bool "Write the current PID to the CONTEXTIDR register"
> > + default y if ARM_SPE_PMU
> > help
> > Enabling this option causes the kernel to write the current PID to
> > the CONTEXTIDR register, at the expense of some additional
>
> Given that PID_IN_CONTEXTIDR doesn't take PID namespacing into account,
> IIUC it's kinda broken today (and arguably removing that support would be
> better).
>
> Can we not track the (namespaced) PID in thte main ringbuffer regardless of
> PID_IN_CONTEXTIDR, and leave PID_IN_CONTEXTIDR as an external debug aid
> only?

The (namespaced) PID is already tracked in other perf records; the point
of putting PID in CONTEXTIDR is that SPE and ETM will capture it into the
AUX buffer, and this can be used to correlate AUX buffer events with
other perf events when the AUX buffer doesn't have hardware timestamps
that are sufficiently precise and can be converted to kernel time.

Right now, a kernel may be enabling PID_IN_CONTEXTIDR for other reasons,
and in a SPE or ETM perf session run from within a container, the AUX buffer
will be capturing root-namespace PIDs. This is wrong, as a container, and
an events created in a container's non-root PID namespace, should not
have visibility of root-namespace PIDs. The solution is for the SPE and ETM
PMU drivers to disable CONTEXTIDR capture if the event's PID namespace
is non-root. For SPE and ETM4, this is straightforward - both trace formats
are self-describing and perf's decoder can cope with CONTEXTIDR being
absent even if it was requested. For ETM3 and PTM (on older 32-bit cores)
this might not be the case. So it may be better to have perf_event_open
fail and have userspace retry with contextid sampling disabled.

This does mean perf sessions from within a container will lack PIDs in AUX
buffers (so correlation relying on that will fail), but that's better than having
these sessions expose root-namespace PIDs as they do now. perf sessions
from the root namespace just work.

(The alternative, of adjusting CONTEXTIDR to trace non-root-namespace PIDs,
doesn't work in general - the namespace belongs to the event tracing the
process, not the process being traced. Worst case, you might have a process
subject to an SPE event in one namespace and an ETM event in another -
there is no value of CONTEXTIDR that works for both.)

Al


> Making this default y is ARM_SPE_PMU implies it'll be on in all distro kernels, and
> I think we need to think harder before doing that.