Re: [PATCH 1/2] perf_events: add cgroup support (v8)

From: Stephane Eranian
Date: Thu Feb 10 2011 - 06:47:23 EST


On Wed, Feb 9, 2011 at 10:47 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Tue, 2011-02-08 at 23:31 +0100, Stephane Eranian wrote:
>> Peter,
>>
>> See comments below.
>>
>>
>> On Mon, Feb 7, 2011 at 5:10 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> > Compile tested only, depends on the cgroup::exit patch
>> >
>> > --- linux-2.6.orig/include/linux/perf_event.h
>> > +++ linux-2.6/include/linux/perf_event.h
>> > @@ -905,6 +929,9 @@ struct perf_cpu_context {
>> >    Âstruct list_head        Ârotation_list;
>> >    Âint               jiffies_interval;
>> >    Âstruct pmu           Â*active_pmu;
>> > +#ifdef CONFIG_CGROUP_PERF
>> > +    struct perf_cgroup       Â*cgrp;
>> > +#endif
>> > Â};
>> >
>> I don't quite understand the motivation for adding cgrp to cpuctx.
>>
>> > --- linux-2.6.orig/kernel/perf_event.c
>> > +++ linux-2.6/kernel/perf_event.c
>> > +static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx)
>> > +{
>> > + Â Â Â struct perf_cgroup *cgrp_out = cpuctx->cgrp;
>> > + Â Â Â if (cgrp_out)
>> > + Â Â Â Â Â Â Â __update_cgrp_time(cgrp_out);
>> > +}
>> > +
>> What's the benefit of this form compared to the original from_task() version?
>
> Answering both questions, I did this so we could still do the
> sched_out() while the task has already been flipped to a new cgroup.
> Note that both attach and the new exit cgroup_subsys methods are called
> after they update the task's cgroup. While they do provide the old
> cgroup as an argument, making use of that requires passing that along
> which would have been a much more invasive change.
>
>
>> > + Â Â Â Â Â Â Â Â Â Â Â if (mode & PERF_CGROUP_SWOUT)
>> > + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â cpu_ctx_sched_out(cpuctx, EVENT_ALL);
>> > +
>> > + Â Â Â Â Â Â Â Â Â Â Â if (mode & PERF_CGROUP_SWIN) {
>> > + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â cpu_ctx_sched_in(cpuctx, EVENT_ALL, task, 1);
>> > + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â cpuctx->cgrp = perf_cgroup_from_task(task);
>> > + Â Â Â Â Â Â Â Â Â Â Â }
>> > + Â Â Â Â Â Â Â }
>> I think there is a risk on cpuctx->cgrp pointing to stale cgrp information.
>> Shouldn't we also set cpuctx->cgrp = NULL on SWOUT?
>
> Yeah, we probably should.
>
>> > +static int __perf_cgroup_move(void *info)
>> > +{
>> > + Â Â Â struct task_struct *task = info;
>> > + Â Â Â perf_cgroup_switch(task, PERF_CGROUP_SWOUT | PERF_CGROUP_SWIN);
>> > + Â Â Â return 0;
>> > +}
>> > +
>> > +static void perf_cgroup_move(struct task_struct *task)
>> > +{
>> > + Â Â Â task_function_call(task, __perf_cgroup_move, task);
>> > +}
>> > +
>> > +static void perf_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
>> > + Â Â Â Â Â Â Â struct cgroup *old_cgrp, struct task_struct *task,
>> > + Â Â Â Â Â Â Â bool threadgroup)
>> > +{
>> > + Â Â Â perf_cgroup_move(task);
>> > + Â Â Â if (threadgroup) {
>> > + Â Â Â Â Â Â Â struct task_struct *c;
>> > + Â Â Â Â Â Â Â rcu_read_lock();
>> > + Â Â Â Â Â Â Â list_for_each_entry_rcu(c, &task->thread_group, thread_group) {
>> > + Â Â Â Â Â Â Â Â Â Â Â perf_cgroup_move(c);
>> > + Â Â Â Â Â Â Â }
>> > + Â Â Â Â Â Â Â rcu_read_unlock();
>> > + Â Â Â }
>> > +}
>> > +
>> I suspect my original patch was not necessarily handling the attach completely
>> when you move an existing task into a cgroup which was already monitored.
>> I think you may have had to wait until a ctxsw. Looks like this callback handles
>> this better.
>
> Right, this deals with moving a task into a cgroup that isn't currently
> being monitored and its converse, moving it out of a cgroup that is
> being monitored.
>
>> Let me make sure I understand the threadgroup iteration, though. I suspect
>> this handles the situation where a multi-threaded app is moved into a cgroup
>
> Indeed.
>
>> while there is already cgroup monitoring active. In that case and if we do not
>> want to wait until there is at least one ctxsw on all CPUs, then we have to
>> check if the other threads are not already running on the other CPUs.If so,
>> we need to do a cgroup switch on those CPUs. Otherwise, we have nothing to
>> do. Am I getting this right?
>
> Right, so if any of those tasks is currently running, that cpu will be
> monitoring their old cgroup, hence we send an IPI to flip cgroups.
>
I have built a test case where this would trigger. I launched a multi-threaded
app, and then I move the pid into a cgroup via: echo PID >/cgroup/tests/tasks.
I don't see any perf_cgroup move beyond the PID passed.

I looked at kernel/cgroup.c and I could not find a invocation of
ss->attach() that
would pass threadgroup = true. So I am confused here.

I wonder how the cgroupfs 'echo PID >tasks' interface would make the distinction
between PID and TID. It seems possible to move one thread of a multi-threaded
process into a cgroup but not the others.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/