Re: [PATCH 1/2] perf_events: add cgroup support (v8)
From: Paul Menage
Date: Mon Feb 07 2011 - 14:29:36 EST
On Wed, Feb 2, 2011 at 4:46 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Wed, 2011-02-02 at 17:20 +0530, Balbir Singh wrote:
>> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> [2011-02-02 12:29:20]:
>>
>> > On Thu, 2011-01-20 at 15:39 +0100, Peter Zijlstra wrote:
>> > > On Thu, 2011-01-20 at 15:30 +0200, Stephane Eranian wrote:
>> > > > @@ -4259,8 +4261,20 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
>> > > >
>> > > > /* Reassign the task to the init_css_set. */
>> > > > task_lock(tsk);
>> > > > + /*
>> > > > + * we mask interrupts to prevent:
>> > > > + * - timer tick to cause event rotation which
>> > > > + * could schedule back in cgroup events after
>> > > > + * they were switched out by perf_cgroup_sched_out()
>> > > > + *
>> > > > + * - preemption which could schedule back in cgroup events
>> > > > + */
>> > > > + local_irq_save(flags);
>> > > > + perf_cgroup_sched_out(tsk);
>> > > > cg = tsk->cgroups;
>> > > > tsk->cgroups = &init_css_set;
>> > > > + perf_cgroup_sched_in(tsk);
>> > > > + local_irq_restore(flags);
>> > > > task_unlock(tsk);
>> > > > if (cg)
>> > > > put_css_set_taskexit(cg);
>> > >
>> > > So you too need a callback on cgroup change there.. Li, Paul, any chance
>> > > we can fix this cgroup_subsys::exit callback? The scheduler code needs
>> > > to do funny thing because its in the wrong place as well.
>> >
>> > cgroup guys? Shall I just fix this exit thing since the only user seems
>> > to be the scheduler and now perf for both of which its unfortunate at
>> > best?
>>
>> Are you suggesting that the cgroup_exit on task_exit notification should be
>> pulled out?
>
>
> No, just fixed. The callback as it exists isn't useful and leads to
> hacks like the above.
>
>
>> > Balbir, memcontrol.c uses pre_destroy(), I pose that using this method
>> > is broken per definition since it makes the cgroup empty notification
>> > void.
>> >
>>
>> We use pre_destroy() to reclaim, so that delete/rmdir() will be able
>> to clean up the node/group. I am not sure what you mean by it makes
>> the empty notification void and why pre_destroy() is broken?
>
> A quick look at the code looked like it could return -EBUSY (and other
> errors), in that case the rmdir of the empty cgroup will fail.
>
> Therefore it can happen that after the last task is removed, and we get
> the notification that the cgroup is empty, and we attempt the rmdir we
> will fail.
>
> This again means that all such notification handlers must poll state,
> which is ridiculous.
>
Not necessarily - we could make it that a failed rmdir() sets a bit
that causes a notification again once the final refcount is dropped
again on the cgroup.
Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/