Re: [v2 PATCH 3/9] mm: vmscan: guarantee shrinker_slab_memcg() sees valid shrinker_maps for online memcg

From: Johannes Weiner
Date: Tue Dec 15 2020 - 12:17:35 EST


On Mon, Dec 14, 2020 at 02:37:16PM -0800, Yang Shi wrote:
> The shrink_slab_memcg() races with mem_cgroup_css_online(). A visibility of CSS_ONLINE flag
> in shrink_slab_memcg()->mem_cgroup_online() does not guarantee that we will see
> memcg->nodeinfo[nid]->shrinker_maps != NULL. This may occur because of processor reordering
> on !x86.
>
> This seems like the below case:
>
> CPU A CPU B
> store shrinker_map load CSS_ONLINE
> store CSS_ONLINE load shrinker_map

But we have a separate check on shrinker_maps, so it doesn't matter
that it isn't guaranteed, no?

The only downside I can see is when CSS_ONLINE isn't visible yet and
we bail even though we'd be ready to shrink. Although it's probably
unlikely that there would be any objects allocated already...

Can somebody remind me why we check mem_cgroup_online() at all?

If shrinker_map is set, we can shrink: .css_alloc is guaranteed to be
complete, and by using RCU for the shrinker_map pointer, the map is
also guaranteed to be initialized. There is nothing else happening
during onlining that you may depend on.

If shrinker_map isn't set, we cannot iterate the bitmap. It does not
really matter whether CSS_ONLINE is reordered and visible already.

Agreed with Dave: if we need that synchronization around onlining, it
needs to happen inside the cgroup core. But I wouldn't add that until
somebody actually required it.