Re: [PATCH 5/5] mm: workingset: make shadow node shrinker memcg aware
From: Vladimir Davydov
Date: Mon Feb 08 2016 - 09:28:48 EST
On Mon, Feb 08, 2016 at 01:23:53AM -0500, Johannes Weiner wrote:
> On Sun, Feb 07, 2016 at 08:27:35PM +0300, Vladimir Davydov wrote:
> > Workingset code was recently made memcg aware, but shadow node shrinker
> > is still global. As a result, one small cgroup can consume all memory
> > available for shadow nodes, possibly hurting other cgroups by reclaiming
> > their shadow nodes, even though reclaim distances stored in its shadow
> > nodes have no effect. To avoid this, we need to make shadow node
> > shrinker memcg aware.
> >
> > Signed-off-by: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx>
>
> This patch is straight forward, but there is one tiny thing that bugs
> me about it, and that is switching from available memory to the size
> of the active list. Because the active list can shrink drastically at
> runtime.
Yeah, active file lru is a volatile thing indeed. Not only can it shrink
rapidly, it can also grow in an instant (e.g. due to mark_page_accessed)
so you're right - sizing shadow node lru basing solely on the active lru
size would be too unpredictable.
>
> It's true that both the shrinking of the active list and subsequent
> activations to regrow it will reduce the number of actionable
> refaults, and so it wouldn't be unreasonable to also shrink shadow
> nodes when the active list shrinks.
>
> However, I think these are too many assumptions to encode in the
> shrinker, because it is only meant to prevent a worst-case explosion
> of radix tree nodes. I'd prefer it to be dumb and conservative.
>
> Could we instead go with the current usage of the memcg? Whether
> reclaim happens globally or due to the memory limit, the usage at the
> time of reclaim gives a good idea of the memory is available to the
> group. But it's making less assumptions about the internal composition
> of the memcg's memory, and the consequences associated with that.
But that would likely result in wasting a considerable chunk of memory
for stale shadow nodes in case file caches constitute only a small part
of memcg memory consumption, which isn't good IMHO.
May be, we'd better use LRU_ALL_FILE / 2 instead?
diff --git a/mm/workingset.c b/mm/workingset.c
index 8c07cd8af15e..8a75f8d2916a 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -351,9 +351,10 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
if (memcg_kmem_enabled())
pages = mem_cgroup_node_nr_lru_pages(sc->memcg, sc->nid,
- BIT(LRU_ACTIVE_FILE));
+ LRU_ALL_FILE);
else
- pages = node_page_state(sc->nid, NR_ACTIVE_FILE);
+ pages = node_page_state(sc->nid, NR_ACTIVE_FILE) +
+ node_page_state(sc->nid, NR_INACTIVE_FILE);
/*
* Active cache pages are limited to 50% of memory, and shadow
@@ -369,7 +370,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
*
* PAGE_SIZE / radix_tree_nodes / node_entries / PAGE_SIZE
*/
- max_nodes = pages >> (RADIX_TREE_MAP_SHIFT - 3);
+ max_nodes = pages >> (1 + RADIX_TREE_MAP_SHIFT - 3);
if (shadow_nodes <= max_nodes)
return 0;