Re: [PATCH 03/10] mm: Assign memcg-aware shrinkers bitmap to memcg

From: Kirill Tkhai
Date: Tue Mar 27 2018 - 11:17:47 EST


On 27.03.2018 13:00, Vladimir Davydov wrote:
> On Mon, Mar 26, 2018 at 06:29:05PM +0300, Kirill Tkhai wrote:
>>>> @@ -182,6 +187,9 @@ struct mem_cgroup {
>>>> unsigned long low;
>>>> unsigned long high;
>>>>
>>>> + /* Bitmap of shrinker ids suitable to call for this memcg */
>>>> + struct shrinkers_map __rcu *shrinkers_map;
>>>> +
>>>
>>> We keep all per-node data in mem_cgroup_per_node struct. I think this
>>> bitmap should be defined there as well.
>>
>> But them we'll have to have struct rcu_head for every node to free the map
>> via rcu. This is the only reason I did that. But if you think it's not a problem,
>> I'll agree with you.
>
> I think it's OK. It'd be consistent with how list_lru handles
> list_lru_memcg reallocations.
>
>>>> @@ -4487,6 +4490,8 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
>>>> struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>>>> struct mem_cgroup_event *event, *tmp;
>>>>
>>>> + free_shrinker_maps(memcg);
>>>> +
>>>
>>> AFAIU this can race with shrink_slab accessing the map, resulting in
>>> use-after-free. IMO it would be safer to free the bitmap from css_free.
>>
>> But doesn't shrink_slab() iterate only online memcg?
>
> Well, yes, shrink_slab() bails out if the memcg is offline, but I
> suspect there might be a race condition between shrink_slab and
> css_offline when shrink_slab calls shrinkers for an offline cgroup.
>
>>
>>>> /*
>>>> * Unregister events and notify userspace.
>>>> * Notify userspace about cgroup removing only after rmdir of cgroup
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index 97ce4f342fab..9d1df5d90eca 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -165,6 +165,10 @@ static DECLARE_RWSEM(bitmap_rwsem);
>>>> static int bitmap_id_start;
>>>> static int bitmap_nr_ids;
>>>> static struct shrinker **mcg_shrinkers;
>>>> +struct shrinkers_map *__rcu root_shrinkers_map;
>>>
>>> Why do you need root_shrinkers_map? AFAIR the root memory cgroup doesn't
>>> have kernel memory accounting enabled.
>> But we can charge the corresponding lru and iterate it over global reclaim,
>> don't we?
>
> Yes, I guess you're right. But do we need to care about it? Would it be
> OK if we iterated over all shrinkers for the root cgroup? Dunno...

In case of 2000 shrinkers, this will flush the cache. This is the reason :)

> Anyway, please try to handle the root cgroup consistently with other
> cgroups. I mean, nothing like this root_shrinkers_map should exist.
> It should be either a part of root_mem_cgroup or we should iterate over
> all shrinkers for the root cgroup.

It's not possible. root_mem_cgroup does not exist always. Even if CONFIG_MEMCG
is enabled, memcg may be prohibited by boot params.

In case of it's not prohibited, there are some shrinkers, which are registered
before it's initialized, while memory_cgrp_subsys can't has .early_init = 1.

>>
>> struct list_lru_node {
>> ...
>> /* global list, used for the root cgroup in cgroup aware lrus */
>> struct list_lru_one lru;
>> ...
>> };

Kirill