Re: [PATCH v2 tip/perf/core 1/6] perf symbols: find symbols in different mount namespace

From: Krister Johansen
Date: Mon Jul 10 2017 - 19:30:35 EST

On Mon, Jul 10, 2017 at 07:52:49PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jul 10, 2017 at 03:39:25PM -0700, Krister Johansen escreveu:
> > On Mon, Jul 10, 2017 at 08:17:00AM +0200, Thomas-Mich Richter wrote:
> > > On 07/07/2017 09:36 PM, Krister Johansen wrote:
> > > > On Thu, Jul 06, 2017 at 04:41:30PM -0300, Arnaldo Carvalho de Melo wrote:
> > > >> Em Wed, Jul 05, 2017 at 06:48:08PM -0700, Krister Johansen escreveu:
> > > >>> Teach perf how to resolve symbols from binaries that are in a different
> > > >>> mount namespace from the tool. This allows perf to generate meaningful
> > > >>> stack traces even if the binary resides in a different mount namespace
> > > >>> from the tool.
> > > >>
> > > >> I was trying to find a way to test after applying each of the patches in
> > > >> this series, when it ocurred to me that if a process that appears on a
> > > >> file has exit, how can we access /proc/%ITS_PID/something?
> > > >
> > > > You're correct. We can't access /proc/<PID>/whatever once the process
> > > > has exited. That was the impeteus for patches 4 and 6, which allow us
> > > > to capture the binary (and debuginfo, if it exists) into the buildid
> > > > cache so that if we do have a trace that exists after a process or
> > > > container exists, we'll still be able to resolve some of the symbols.
> > > Any ideas on how to extend this to be able to resolve symbols after
> > > the process/container exited?
> > > I believe it boils down on how to interpret the mnt inode number in the
> > > Can this be done post-mortem? Maybe the PERF_RECORD_NAMESPACE record
> > > has to contain more data than just the inode number?
> > I think we're talking past one another. If the container exits then the
> > inode numbers that identify mount namespace are referring to something
> > that is no longer valid. There's no mount namespace to enter in order
> > to locate the binary objects. They may be on a volume that's no longer
> > mounted.
> > I have a pair of patches in the existing set that copies the binary
> > objects into the buildid cache. This lets you resolve the symbols after
> > the container has exited, provided that you recorded the buildids during
> > the trace.
> > If you apply all the patches in this set, you should be able to generate
> > traces that you can look at with script or report even after the process
> > has exited. I've been able to do it in my tests, at least.
> I will work on testing them soon, I just wanted this discussion to take
> place, what you did seems to be the best we can do with the existing
> kernel infrastructure, and is a clear advance, so we need to test and
> merge it.

Happy to have the discussion. Aplologies if having the patches
iteratively add to one another isn't the best way to have this reviewed
and understood. If you just apply the first few, you don't get the
support to pull these into the build-id cache.

> Getting the build-ids for the binaries is the key here, then its just a
> matter of populating a database where to get the matching binaries, we
> wouldn't need even to copy the actual binaries at record time.

Unfortunately, it's not sufficient to save the path to the target binary
because it's possible that after the container exits, and the namespace
is destroyed, there may be no path that describes to the host how to
access the files in the container. There are two different interactions
here that frustrate this:

1. Containers run under a pivoted root, so the containers view of the
path may be different from the host's view of the path. E.g. /usr/bin/node
in the container may actually be /var/container_a/root/usr/bin/node, or
something like that. However, see #2.

2. It's also entirely possible for a container to have mounted a
filesystem that's not accessible or mounted from the host. If, for
example, you're using docker with the direct-lvm storage driver, then
your storage device may be mounted in the vfs attached to the container,
but have no mount in the host's vfs. In a situation like this, once the
container exits, the that lvm filesystem is unmounted. In order to
access the files in that container, you basically need to setns(2) into
the container's mount namespace and look up the files using the a path
that resolves in the mount namespace of perf's target.