Re: [RFC][PATCH 8/9 v2] cgroup: avoid creating new cgroup under acgroup being destroyed

From: Hiroyuki Kamezawa
Date: Fri Apr 27 2012 - 20:20:52 EST


On Sat, Apr 28, 2012 at 5:40 AM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> On Fri, Apr 27, 2012 at 03:04:14PM +0900, KAMEZAWA Hiroyuki wrote:
>> When ->pre_destroy() is called, it should be guaranteed that
>> new child cgroup is not created under a cgroup, where pre_destroy()
>> is running. If not, ->pre_destroy() must check children and
>> return -EBUSY, which causes warning.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
>
> Hmm... I'm getting confused more.  Why do we need these cgroup changes
> at all?  cgroup still has cgrp->count check and
> cgroup_clear_css_refs() after pre_destroy() calls.  The order of
> changes should be,
>
> * Make memcg pre_destroy() not fail; however, pre_destroy() should
>  still be ready to be retried.  That's the defined interface.
>
> * cgroup core updated to drop pre_destroy() retrying and guarantee
>  that pre_destroy() invocation will happen only once.
>
> * memcg and other cgroups can update their pre_destroy() if the "won't
>  be retried" part can simplify their implementations.
>

What I thought was...
Assume a memory cgoup A, with use_hierarchy==1.

1. thread:0 start calling pre->destroy of cgroup A
2. thread:0 it sometimes calls cond_resched or other sleep functions.
3. thread:1 create a cgroup B under "A"
4. thread:1 attach a thread X to cgroup A/B
5. res_counter of A charged up. but pre_destroy() can't find what happens
because it scans LRU of A.

So, we have -EBUSY now. I considered some options to fix this.

option 1) just return 0 instead of -EBUSY when pre_destroy() finds a
task or a child.

There is a race....even if we return 0 here and expects cgroup code
can catch it,
the thread or a child we found may be moved to other cgroup before we check it
in cgroup's final check.
In that case, the cgroup will be freed before full-ack of
pre_destory() and the charges
will be lost.

option 2) move all codes to ->destory()
That was previous version of this set.

This is option3 that preventing creation of new child.

If you don't like this, I'll move all codes to ->destroy() and use
asynchronous again.

Thanks,
-Kame
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/