Re: [PATCH v4 0/3] perf: add support for analyzing events for containers

From: Hari Bathini
Date: Tue Jan 03 2017 - 06:28:26 EST


Hi Krister,

Thanks for reviewing..


On Thursday 29 December 2016 07:11 AM, Krister Johansen wrote:
On Fri, Dec 16, 2016 at 12:06:55AM +0530, Hari Bathini wrote:
This patch-set overcomes this limitation by using cgroup identifier as
container unique identifier. A new PERF_RECORD_NAMESPACES event that
records namespaces related info is introduced, from which the cgroup
namespace's device & inode numbers are used as cgroup identifier. This
is based on the assumption that each container is created with it's own
cgroup namespace allowing assessment/analysis of multiple containers
using cgroup identifier.
Why choose cgroups when the kernel dispenses namespace-unique
identifiers. Cgroup membership can be arbitrary. Moreover, cgroup and

Agreed. But doesn't that hold for any other namespace or a combination
of namespaces as well?

namespace destruction are handled by separate subsystems. It's possible
to have a cgroup notifier run prior to network namespace teardown
occurring.
If it were me, I'd re-use existing convention to identify the namespaces
you want to monitor. The code in nsenter(1) can take a namespace that's
been bind mount'd on a file, or extract the ns information from a task
in /procfs.

As PERF_RECORD_NAMESPACES gets all namespaces info, I assume your reservation
is with how perf report (patch 3/3) is processed, which could be improved to consider
other namespaces as well, once PERF_RECORD_NAMESPACES is in.

My biggest concern is how the sample data is handled after it has been
collected. Both namespaces and cgroups don't survive reboots. Will the
records will contain all the persistent state needed to run a report or
script command at a later date?

Yes. Sideband events are emitted when a new thread is created or an existing
thread's namespaces are changed, which can be post processed for reporting.

Does this code attempt to enter alternate namespaces in order to record
stack/symbol information for a '-g' style trace? If so, how are you
holding on to that information? There's no guarantee that a particular
container will be alive or have its filesystems reachable from the host
if the trace data is evaluated at a later time.

I am not altering the existing behavior of -g option with these patches..

Thanks
Hari