Re: [RFC][PATCH 1/7] memcg: use css_get in sock_update_memcg()

From: Glauber Costa
Date: Fri Apr 05 2013 - 04:08:13 EST


On 04/03/2013 07:29 PM, Michal Hocko wrote:
> On Wed 03-04-13 16:58:48, Glauber Costa wrote:
>> On 04/03/2013 01:11 PM, Li Zefan wrote:
>>> Use css_get/css_put instead of mem_cgroup_get/put.
>>>
>>> Note, if at the same time someone is moving @current to a different
>>> cgroup and removing the old cgroup, css_tryget() may return false,
>>> and sock->sk_cgrp won't be initialized.
>>>
>>> Signed-off-by: Li Zefan <lizefan@xxxxxxxxxx>
>>> ---
>>> mm/memcontrol.c | 8 ++++----
>>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index 23d0f6e..43ca91d 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -536,15 +536,15 @@ void sock_update_memcg(struct sock *sk)
>>> */
>>> if (sk->sk_cgrp) {
>>> BUG_ON(mem_cgroup_is_root(sk->sk_cgrp->memcg));
>>> - mem_cgroup_get(sk->sk_cgrp->memcg);
>>> + css_get(&sk->sk_cgrp->memcg->css);
>
> I am not sure I understand this one. So we have a goup here (which means
> that somebody already took a reference on it, right?) and we are taking
> another reference. If this is released by sock_release_memcg then who
> releases the previous one? It is not directly related to the patch
> because this has been done previously already. Could you clarify
> Glauber, please?

This should be documented in the commit that introduced this, and it was
one of the first bugs I've handled with this code.

Bottom line, we can create sockets normally, and those will have process
context. But we also can create sockets by cloning existing sockets. To
the best of my knowledge, this is done by things like accept().

Because those sockets are a clone of their ancestors, they also belong
to a workload that should be limited. Also note that because they have
cgroup context, we will eventually try to put them. So we need to grab
an extra reference.

socket_update_cgroup is always called at socket creation, and the
original structures are filled with zeroes. Therefore cloning is the
*only* path that takes us here with sk->sk_cgroup filled.

My comment right above this excerpt states:

/* Socket cloning can throw us here with sk_cgrp already
* filled. It won't however, necessarily happen from
* process context. So the test for root memcg given
* the current task's memcg won't help us in this case.
*
* Respecting the original socket's memcg is a better
* decision in this case.
*/

>
>>> return;
>>> }
>>>
>>> rcu_read_lock();
>>> memcg = mem_cgroup_from_task(current);
>>> cg_proto = sk->sk_prot->proto_cgroup(memcg);
>>> - if (!mem_cgroup_is_root(memcg) && memcg_proto_active(cg_proto)) {
>>> - mem_cgroup_get(memcg);
>>> + if (!mem_cgroup_is_root(memcg) &&
>>> + memcg_proto_active(cg_proto) && css_tryget(&memcg->css)) {
>>> sk->sk_cgrp = cg_proto;
>>> }
>>
>> What happens if this tryget fails ? Won't we leak a reference here? We
>> will put regardless when the socket is released, and this may go
>> negative. No?
>
> AFAICS sock_release_memcg releases the reference only if sk->sk_cgrp and
> that one wouldn't be set if css_tryget fails.
>

Yes, this is totally fine. I was actually thinking of the same socket
cloning I mentioned above. We cannot fail that path because we already
have an "implicit" reference, we just need to officially mark it.

Failing here is indeed fine. Future cloned sockets from this socket will
have NULL cgroup context as well.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/