Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
From: Kirill Tkhai
Date: Fri Apr 13 2018 - 07:49:47 EST
On 13.04.2018 14:38, Michal Hocko wrote:
> On Fri 13-04-18 14:29:11, Kirill Tkhai wrote:
>> On 13.04.2018 14:20, Michal Hocko wrote:
>>> On Fri 13-04-18 14:06:40, Kirill Tkhai wrote:
>>>> On 13.04.2018 14:02, Michal Hocko wrote:
>>>>> On Fri 13-04-18 12:35:22, Kirill Tkhai wrote:
>>>>>> On 13.04.2018 11:55, Michal Hocko wrote:
>>>>>>> On Thu 12-04-18 17:52:04, Kirill Tkhai wrote:
>>>>>>> [...]
>>>>>>>> @@ -4471,6 +4477,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
>>>>>>>>
>>>>>>>> return &memcg->css;
>>>>>>>> fail:
>>>>>>>> + mem_cgroup_id_remove(memcg);
>>>>>>>> mem_cgroup_free(memcg);
>>>>>>>> return ERR_PTR(-ENOMEM);
>>>>>>>> }
>>>>>>>
>>>>>>> The only path which jumps to fail: here (in the current mmotm tree) is
>>>>>>> error = memcg_online_kmem(memcg);
>>>>>>> if (error)
>>>>>>> goto fail;
>>>>>>>
>>>>>>> AFAICS and the only failure path in memcg_online_kmem
>>>>>>> memcg_id = memcg_alloc_cache_id();
>>>>>>> if (memcg_id < 0)
>>>>>>> return memcg_id;
>>>>>>>
>>>>>>> I am not entirely clear on memcg_alloc_cache_id but it seems we do clean
>>>>>>> up properly. Or am I missing something?
>>>>>>
>>>>>> memcg_alloc_cache_id() may allocate a lot of memory, in case of the system reached
>>>>>> memcg_nr_cache_ids cgroups. In this case it iterates over all LRU lists, and double
>>>>>> size of every of them. In case of memory pressure it can fail. If this occurs,
>>>>>> mem_cgroup::id is not unhashed from IDR and we leak this id.
>>>>>
>>>>> OK, my bad I was looking at the bad code path. So you want to clean up
>>>>> after mem_cgroup_alloc not memcg_online_kmem. Now it makes much more
>>>>> sense. Sorry for the confusion on my end.
>>>>>
>>>>> Anyway, shouldn't we do the thing in mem_cgroup_free() to be symmetric
>>>>> to mem_cgroup_alloc?
>>>>
>>>> We can't, since it's called from mem_cgroup_css_free(), which doesn't have a deal
>>>> with idr freeing. All the asymmetry, we see, is because of the trick to unhash ID
>>>> earlier, then from mem_cgroup_css_free().
>>>
>>> Are you sure. It's been some time since I've looked at the quite complex
>>> cgroup tear down code but from what I remember, css_free is called on
>>> the css release (aka when the reference count drops to zero). mem_cgroup_id_put_many
>>> seems to unpin the css reference so we should have idr_remove by the
>>> time when css_free is called. Or am I still wrong and should go over the
>>> brain hurting cgroup removal code again?
>>
>> mem_cgroup_id_put_many() unpins css, but this may be not the last reference to the css.
>> Thus, we release ID earlier, then all references to css are freed.
>
> Right and so what. If we have released the idr then we are not going to
> do that again in css_free. That is why we have that memcg->id.id > 0
> check before idr_remove and memcg->id.id = 0 for the last memcg ref.
> count. So again, why cannot we do the clean up in mem_cgroup_free and
> have a less confusing code? Or am I just not getting your point and
> being dense here?
We can, but mem_cgroup_free() called from mem_cgroup_css_alloc() is unlikely case.
The likely case is mem_cgroup_free() is called from mem_cgroup_css_free(), where
this idr manipulations will be a noop. Noop in likely case looks more confusing
for me.
Less confusing will be to move
memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
1, MEM_CGROUP_ID_MAX,
GFP_KERNEL);
into mem_cgroup_css_alloc(). How are you think about this?
Kirill