Re: [RFC PATCH] perf, bpf: Retain kernel executable code in memory to aid Intel PT tracing
From: Ingo Molnar
Date: Mon Feb 11 2019 - 02:46:41 EST
* Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Thu, Feb 07, 2019 at 01:19:01PM +0200, Adrian Hunter wrote:
> > Subject to memory pressure and other limits, retain executable code, such
> > as JIT-compiled bpf, in memory instead of freeing it immediately it is no
> > longer needed for execution.
> >
> > While perf is primarily aimed at statistical analysis, tools like Intel
> > PT can aim to provide a trace of exactly what happened. As such, corner
> > cases that can be overlooked statistically need to be addressed. For
> > example, there is a gap where JIT-compiled bpf can be freed from memory
> > before a tracer has a chance to read it out through the bpf syscall.
> > While that can be ignored statistically, it contributes to a death by
> > 1000 cuts for tracers attempting to assemble exactly what happened. This is
> > a bit gratuitous given that retaining the executable code is relatively
> > simple, and the amount of memory involved relatively small. The retained
> > executable code is then available in memory images such as /proc/kcore.
> >
> > This facility could perhaps be extended also to init sections.
> >
> > Note that this patch is compile tested only and, at present, is missing
> > the ability to retain symbols.
>
> You don't need the symbols; you already have them through
> PERF_RECORD_KSYMBOL.
>
> Also; afaict this patch guarantees exactly nothing. It registers a
> shrinker which will (given enough memory pressure) happily free your
> text before we get around to copying it out.
>
> Did you read this proposal?
>
> https://lkml.kernel.org/r/20190109101808.GG1900@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
> (also: s/KCORE_QC/KCORE_QS/ for quiescent state)
>
> That would create an RCU like interface to /proc/kcore and give you the
> guarantees you need, while also allowing the memory to get freed once
> you've obtained a copy.
Yeah, adding a proper change-notification interface to /proc/kcore sounds
like a superior solution to trying to shoehorn this down perf's throat.
It's not like any of this is useful without having opened /proc/kcore.
Also, /proc/kcore is privileged, so the indefinite resource allocation
side effect in case user-space doesn't drain the lists is OK.
Thanks,
Ingo