Re: [PATCH v2 tip/perf/core 1/6] perf symbols: find symbols in different mount namespace

From: Arnaldo Carvalho de Melo
Date: Tue Jul 11 2017 - 08:51:27 EST

Em Mon, Jul 10, 2017 at 04:29:43PM -0700, Krister Johansen escreveu:
> On Mon, Jul 10, 2017 at 07:52:49PM -0300, Arnaldo Carvalho de Melo wrote:
> > I will work on testing them soon, I just wanted this discussion to take
> > place, what you did seems to be the best we can do with the existing
> > kernel infrastructure, and is a clear advance, so we need to test and
> > merge it.

> Happy to have the discussion. Aplologies if having the patches
> iteratively add to one another isn't the best way to have this reviewed
> and understood. If you just apply the first few, you don't get the
> support to pull these into the build-id cache.

> > Getting the build-ids for the binaries is the key here, then its just a
> > matter of populating a database where to get the matching binaries, we
> > wouldn't need even to copy the actual binaries at record time.

> Unfortunately, it's not sufficient to save the path to the target binary
> because it's possible that after the container exits, and the namespace

The path is not that important, as "/usr/lib64/" is not
enough to uniquely identify a binary, for instance, here in this machine
I have:

[root@jouet ~]# ls -la /root/.debug/usr/lib64/
total 16
drwxr-xr-x. 4 root root 4096 Jun 29 15:46 .
drwxr-xr-x. 40 root root 4096 Jul 7 12:28 ..
drwxr-xr-x. 2 root root 4096 Jun 29 15:46 1c80f527d122e71f3dd3bd7d7f8a00a80143ae53
drwxr-xr-x. 2 root root 4096 Jun 23 10:43 b0fa2afea4d9239b66a0a260cbaceb1b9532299a
[root@jouet ~]#

[root@jouet ~]# file /root/.debug/usr/lib64/*/elf
/root/.debug/usr/lib64/ ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/, BuildID[sha1]=1c80f527d122e71f3dd3bd7d7f8a00a80143ae53, for GNU/Linux 2.6.32, not stripped
/root/.debug/usr/lib64/ ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/, BuildID[sha1]=b0fa2afea4d9239b66a0a260cbaceb1b9532299a, for GNU/Linux 2.6.32, not stripped
[root@jouet ~]# o

[root@jouet ~]# readelf -sW /root/.debug/usr/lib64/ > /tmp/a
[root@jouet ~]# readelf -sW /root/.debug/usr/lib64/ > /tmp/b
[root@jouet ~]# diff -u /tmp/a /tmp/b | wc -l
[root@jouet ~]# diff -u /tmp/a /tmp/b | head
@@ -13,298 +13,298 @@
9: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND _dl_argv@GLIBC_PRIVATE (27)
10: 000000000009fbd0 29 FUNC GLOBAL DEFAULT 13 __strspn_c1@xxxxxxxxxxx
11: 0000000000072690 333 FUNC GLOBAL DEFAULT 13 putwchar@@GLIBC_2.2.5
- 12: 00000000001195c0 19 FUNC GLOBAL DEFAULT 13 __gethostname_chk@@GLIBC_2.4
+ 12: 0000000000119630 19 FUNC GLOBAL DEFAULT 13 __gethostname_chk@@GLIBC_2.4
13: 000000000009fbf0 37 FUNC GLOBAL DEFAULT 13 __strspn_c2@xxxxxxxxxxx
- 14: 0000000000132e80 192 FUNC GLOBAL DEFAULT 13 setrpcent@@GLIBC_2.2.5
[root@jouet ~]#

We need to as soon as possible to get the content based unique
identifier for a binary, then try to use just that, not the pathname.

> is destroyed, there may be no path that describes to the host how to
> access the files in the container. There are two different interactions

Right, we need to use the build-id and look it up in a database
populated somehow.

perf right now, by default, collects the build-ids in a table, at the
end of the recording session, trying not to disrupt the monitored
workload by not processing anything, just reading from the buffers and
dumping to a file.

It will also try to populate the build-id, trying first to make a
hardlink and copying it if it fails.

If we can get the build-id at the time of the mmap(binary), as part of
the loading of binaries, that would be ideal, as we're touching the file
headers anyway and the build-id is a small enough cookie.

But again, we should first try to do as far as we can with the
infrastructure we have in the kernel and tooling libraries, lots of
workloads will be serviced just fine with that.

> here that frustrate this:
> 1. Containers run under a pivoted root, so the containers view of the
> path may be different from the host's view of the path. E.g. /usr/bin/node
> in the container may actually be /var/container_a/root/usr/bin/node, or
> something like that. However, see #2.
> 2. It's also entirely possible for a container to have mounted a
> filesystem that's not accessible or mounted from the host. If, for
> example, you're using docker with the direct-lvm storage driver, then
> your storage device may be mounted in the vfs attached to the container,
> but have no mount in the host's vfs. In a situation like this, once the
> container exits, the that lvm filesystem is unmounted. In order to
> access the files in that container, you basically need to setns(2) into
> the container's mount namespace and look up the files using the a path
> that resolves in the mount namespace of perf's target.

That all frustrates accessing the binary via a pathname, agreed.

- Arnaldo