Re: [PATCH v2 04/12] mm: Assign memcg-aware shrinkers bitmap to memcg
From: Kirill Tkhai
Date: Tue Apr 24 2018 - 08:25:10 EST
On 24.04.2018 15:15, Vladimir Davydov wrote:
> On Tue, Apr 24, 2018 at 02:38:51PM +0300, Kirill Tkhai wrote:
>> On 24.04.2018 14:28, Vladimir Davydov wrote:
>>> On Mon, Apr 23, 2018 at 01:54:50PM +0300, Kirill Tkhai wrote:
>>>>>> @@ -1200,6 +1206,8 @@ extern int memcg_nr_cache_ids;
>>>>>> void memcg_get_cache_ids(void);
>>>>>> void memcg_put_cache_ids(void);
>>>>>>
>>>>>> +extern int shrinkers_max_nr;
>>>>>> +
>>>>>
>>>>> memcg_shrinker_id_max?
>>>>
>>>> memcg_shrinker_id_max sounds like an includive value, doesn't it?
>>>> While shrinker->id < shrinker_max_nr.
>>>>
>>>> Let's better use memcg_shrinker_nr_max.
>>>
>>> or memcg_nr_shrinker_ids (to match memcg_nr_cache_ids), not sure...
>>>
>>> Come to think of it, this variable is kinda awkward: it is defined in
>>> vmscan.c but declared in memcontrol.h; it is used by vmscan.c for max
>>> shrinker id and by memcontrol.c for shrinker map capacity. Just a raw
>>> idea: what about splitting it in two: one is private to vmscan.c, used
>>> as max id, say we call it shrinker_id_max; the other is defined in
>>> memcontrol.c and is used for shrinker map capacity, say we call it
>>> memcg_shrinker_map_capacity. What do you think?
>>
>> I don't much like a duplication of the single variable...
>
> Well, it's not really a duplication. For example, shrinker_id_max could
> decrease when a shrinker is unregistered while shrinker_map_capacity can
> only grow exponentially.
>
>> Are there real problems, if it defined in memcontrol.{c,h} and use in
>> both of the places?
>
> The code is more difficult to follow when variables are shared like that
> IMHO. I suggest you try it and see how it looks. May be, it will only
> get worse and we'll have to revert to what we have now. Difficult to say
> without seeing the code.
>
>>
>>>>>> +int expand_shrinker_maps(int old_nr, int nr)
>>>>>> +{
>>>>>> + int id, size, old_size, node, ret;
>>>>>> + struct mem_cgroup *memcg;
>>>>>> +
>>>>>> + old_size = old_nr / BITS_PER_BYTE;
>>>>>> + size = nr / BITS_PER_BYTE;
>>>>>> +
>>>>>> + down_write(&shrinkers_max_nr_rwsem);
>>>>>> + for_each_node(node) {
>>>>>
>>>>> Iterating over cgroups first, numa nodes second seems like a better idea
>>>>> to me. I think you should fold for_each_node in memcg_expand_maps.
>>>>>
>>>>>> + idr_for_each_entry(&mem_cgroup_idr, memcg, id) {
>>>>>
>>>>> Iterating over mem_cgroup_idr looks strange. Why don't you use
>>>>> for_each_mem_cgroup?
>>>>
>>>> We want to allocate shrinkers maps in mem_cgroup_css_alloc(), since
>>>> mem_cgroup_css_online() mustn't fail (it's a requirement of currently
>>>> existing design of memcg_cgroup::id).
>>>>
>>>> A new memcg is added to parent's list between two of these calls:
>>>>
>>>> css_create()
>>>> ss->css_alloc()
>>>> list_add_tail_rcu(&css->sibling, &parent_css->children)
>>>> ss->css_online()
>>>>
>>>> for_each_mem_cgroup() does not see allocated, but not linked children.
>>>
>>> Why don't we move shrinker map allocation to css_online then?
>>
>> Because the design of memcg_cgroup::id prohibits mem_cgroup_css_online() to fail.
>> This function can't fail.
>
> I fail to understand why it is so. Could you please elaborate?
mem_cgroup::id is freed not in mem_cgroup_css_free(), but earlier. It's freed
between mem_cgroup_css_offline() and mem_cgroup_free(), after the last reference
is put.
In case of sometimes we want to free it in mem_cgroup_css_free(), this will
introduce assymmetric in the logic, which makes it more difficult. There is
already a bug, which I fixed in
"memcg: remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure"
new change will make this code completely not-modular and unreadable.
>>
>> I don't think it will be good to dive into reworking of this stuff for this patchset,
>> which is really already big. Also, it will be assymmetric to allocate one part of
>> data in css_alloc(), while another data in css_free(). This breaks cgroup design,
>> which specially introduces this two function to differ allocation and onlining.
>> Also, I've just move the allocation to alloc_mem_cgroup_per_node_info() like it was
>> suggested in comments to v1...
>
> Yeah, but (ab)using mem_cgroup_idr for iterating over all allocated
> memory cgroups looks rather dubious to me...
But we have to iterate over all allocated memory cgroups in any way,
as all of them must have expanded maps. What is the problem?
It's rather simple method, and it faster then for_each_mem_cgroup()
cycle, since it does not have to play with get and put of refcounters.
Kirill