Re: [RFC PATCH 2/2] perf: Filter events based on perf-namespace

From: Aravinda Prasad
Date: Tue Jul 12 2016 - 12:04:41 EST




On Tuesday 12 July 2016 07:57 PM, Peter Zijlstra wrote:
> On Tue, Jul 12, 2016 at 08:55:17AM -0500, Eric W. Biederman wrote:
>
>> I completely misread the description of this, or I would have something
>> earlier. For some reason I thought he was talking about the perf
>> controller.
>>
>> As I recall the tricky part of this was to have tracing that was safe
>> and usable inside of a container. If you can align a per cgroup with
>> your container that is probably sufficient for the select of processes.
>>

Aligning a cgroup with the container is sufficient if containers are
created with PID namespace. The first prototype was based on that.

However, not sure if it is fair to assume that the containers are
created with PID namespace and the processes inside the container are
grouped into a cgroup, as containers can be created without PID
namespace. In fact it was mentioned in LPC container micro-conference
that some containers are created without PID namespace as they need to
access host PIDs.

With the recent introduction of cgroup namespace, I think we can even
take out the requirement that container should be created with PID
namespace to enable safe tracing inside of a container. We are currently
evaluating that.


>> At the same time there is a real desire to have identifiers like pids
>> translated into the appropriate form for inside of the container.
>> Without that translation they are meaningless inside a container.
>> Further it is necessary to be certain the trancing that is used is is
>> safe for unprivileged users.

pid is already translated inside a container as mentioned below.

>>
>> I don't think I ever suggested or approved of the concept of a perf
>> namespace and that sounds a bit dubious to me.

Yes true this was not suggested during the discussion.

As the kernel does not have the concept of a container, we thought
introduction of perf-namespace could isolate events inside a container,
analogous to other namespaces.

>
> So perf uses the pid-namespace of the event-creator to report PID/TID
> numbers in.
>
> So sys_perf_event_open() -> perf_event_alloc() does
> get_pid_ns(task_active_pid_ns(current)) to set event->ns and then we do:
> task_{tgid,pid}_nr_ns(p, event->ns) to report the PID/TID resp., see
> perf_event_{pid,tid}().
>

--
Regards,
Aravinda