Re: [PATCH v4 0/3] perf: add support for analyzing events for containers

From: Krister Johansen
Date: Wed Dec 28 2016 - 20:41:49 EST


On Fri, Dec 16, 2016 at 12:06:55AM +0530, Hari Bathini wrote:
> This patch-set overcomes this limitation by using cgroup identifier as
> container unique identifier. A new PERF_RECORD_NAMESPACES event that
> records namespaces related info is introduced, from which the cgroup
> namespace's device & inode numbers are used as cgroup identifier. This
> is based on the assumption that each container is created with it's own
> cgroup namespace allowing assessment/analysis of multiple containers
> using cgroup identifier.

Why choose cgroups when the kernel dispenses namespace-unique
identifiers. Cgroup membership can be arbitrary. Moreover, cgroup and
namespace destruction are handled by separate subsystems. It's possible
to have a cgroup notifier run prior to network namespace teardown
occurring.

If it were me, I'd re-use existing convention to identify the namespaces
you want to monitor. The code in nsenter(1) can take a namespace that's
been bind mount'd on a file, or extract the ns information from a task
in /procfs.

My biggest concern is how the sample data is handled after it has been
collected. Both namespaces and cgroups don't survive reboots. Will the
records will contain all the persistent state needed to run a report or
script command at a later date?

Does this code attempt to enter alternate namespaces in order to record
stack/symbol information for a '-g' style trace? If so, how are you
holding on to that information? There's no guarantee that a particular
container will be alive or have its filesystems reachable from the host
if the trace data is evaluated at a later time.

-K