Re: [PATCH 1/2] memcg: dirty pages accounting and limiting infrastructure

From: Minchan Kim
Date: Fri Feb 26 2010 - 01:35:20 EST


On Fri, Feb 26, 2010 at 3:15 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> On Fri, 26 Feb 2010 14:53:39 +0900
> Minchan Kim <minchan.kim@xxxxxxxxx> wrote:
>
>> On Fri, Feb 26, 2010 at 2:01 PM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
>> > Hi,
>> >
>> > On Fri, 26 Feb 2010 13:50:04 +0900
>> > Minchan Kim <minchan.kim@xxxxxxxxx> wrote:
>> >
>> >> > Hm ? I don't read the whole thread but can_attach() is called under
>> >> > cgroup_mutex(). So, it doesn't need to use RCU.
>> >>
>> >> Vivek mentioned memcg is protected by RCU if I understand his intention right.
>> >> So I commented that without enough knowledge of memcg.
>> >> After your comment, I dive into the code.
>> >>
>> >> Just out of curiosity.
>> >>
>> >> Really, memcg is protected by RCU?
>> > yes. All cgroup subsystem is protected by RCU.
>> >
>> >> I think most of RCU around memcg is for protecting task_struct and
>> >> cgroup_subsys_state.
>> >> The memcg is protected by cgroup_mutex as you mentioned.
>> >> Am I missing something?
>> >
>> > There are several levels of protections.
>> >
>> > cgroup subsystem's ->destroy() call back is finally called by
>> >
>> > As this.
>> >
>> > Â768 Â Â Â Â Â Â Â Â synchronize_rcu();
>> > Â769
>> > Â770 Â Â Â Â Â Â Â Â mutex_lock(&cgroup_mutex);
>> > Â771 Â Â Â Â Â Â Â Â /*
>> > Â772 Â Â Â Â Â Â Â Â Â* Release the subsystem state objects.
>> > Â773 Â Â Â Â Â Â Â Â Â*/
>> > Â774 Â Â Â Â Â Â Â Â for_each_subsys(cgrp->root, ss)
>> > Â775 Â Â Â Â Â Â Â Â Â Â Â Â ss->destroy(ss, cgrp);
>> > Â776
>> > Â777 Â Â Â Â Â Â Â Â cgrp->root->number_of_cgroups--;
>> > Â778 Â Â Â Â Â Â Â Â mutex_unlock(&cgroup_mutex);
>> >
>> > Before here,
>> > Â Â Â Â- there are no tasks under this cgroup (cgroup's refcnt is 0)
>> > Â Â Â Â Â&& cgroup is marked as REMOVED.
>> >
>> > Then, this access
>> > Â Â Â Ârcu_read_lock();
>> > Â Â Â Âmem = mem_cgroup_from_task(task);
>> > Â Â Â Âif (css_tryget(mem->css)) Â <===============checks cgroup refcnt
>>
>> If it it, do we always need css_tryget after mem_cgroup_from_task
>> without cgroup_mutex to make sure css is vaild?
>>
> On a case by cases.
>
>> But I found several cases that don't call css_tryget
>>
>> 1. mm_match_cgroup
>> It's used by page_referenced_xxx. so we I think we don't grab
>> cgroup_mutex at that time.
>>
> yes. but all check are done under RCU. And this function never
> access contents of memory which pointer holds.
> And, please conider the whole context.
>
> Â Â Â Âmem_cgroup_try_charge()
> Â Â Â Â Â Â Â Âmem_cout =..... (refcnt +1)
> Â Â Â Â Â Â Â Â....
> Â Â Â Â Â Â Â Âtry_to_free_mem_cgroup_pages(mem_cont)
> Â Â Â Â Â Â Â Â....
> Â Â Â Â Â Â Â Âshrink_xxx_list()
> Â Â Â Â Â Â Â Â....
> Â Â Â Â Â Â Â Â Â Â Â Âpage_referenced_anon(page, mem_cont)
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âmm_match_cgroup(mm, mem_cont)
>
> Then, this mem_cont (2nd argument to mm_match_cgroup) is always valid.
> Â Â Â Ârcu_read_lock();
> Â Â Â Âmemcg = mem_cgroup_from_task(rcu_dereference(mm->ownder));
> Â Â Â Ârcu_read_unlock();
> Â Â Â Âreturn memcg != mem_cont;
>
> Here,
> Â Â Â Â1. mem_cont is never reused. (because refcnt+1)
> Â Â Â Â2. we don't access memcg's contents.
>
> I think even rcu_read_lock()/unlock() is unnecessary.
>
>
>
>> 2. mem_cgroup_oom_called
>> I think in here we don't grab cgroup_mutex, too.
>>
> In OOM-killer context, memcg which causes OOM has refcnt +1.
> Then, not necessary.
>
>
>> I guess some design would cover that problems.
> Maybe.
>
>> Could you tell me if you don't mind?
>> Sorry for bothering you.
>>
>
> In my point of view, the most terrible porblem is heavy cost of
> css_tryget() when you run multi-thread heavy program.
> So, I want to see some inovation, but haven't find yet.
>
> I admit this RCU+refcnt is tend to be hard to review. But it's costly
> operation to take refcnt and it's worth to be handled in the best
> usage of each logics, on a case by cases for now.
>

Thanks for always kind explanation, Kame.

> Thanks,
> -Kame
>
>
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/