Re: [BUGFIX][PATCH v2] add mem_cgroup_replace_page_cache.

From: KAMEZAWA Hiroyuki
Date: Sun Dec 11 2011 - 19:49:34 EST


On Fri, 9 Dec 2011 12:37:01 -0800
Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, 8 Dec 2011 16:18:29 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
>
> > commit ef6a3c6311 adds a function replace_page_cache_page(). This
> > function replaces a page in radix-tree with a new page.
> > At doing this, memory cgroup need to fix up the accounting information.
> > memcg need to check PCG_USED bit etc.
> >
> > In some(many?) case, 'newpage' is on LRU before calling replace_page_cache().
> > So, memcg's LRU accounting information should be fixed, too.
> >
> > This patch adds mem_cgroup_replace_page_cache() and removing old hooks.
> > In that function, old pages will be unaccounted without touching res_counter
> > and new page will be accounted to the memcg (of old page). At overwriting
> > pc->mem_cgroup of newpage, take zone->lru_lock and avoid race with
> > LRU handling.
> >
> > Background:
> > replace_page_cache_page() is called by FUSE code in its splice() handling.
> > Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated
> > page and may be on LRU. LRU mis-accounting will be critical for memory cgroup
> > because rmdir() checks the whole LRU is empty and there is no account leak.
> > If a page is on the other LRU than it should be, rmdir() will fail.
> >
> > Changelog: v1 -> v2
> > - fixed mem_cgroup_disabled() check missing.
> > - added comments.
> >
> > Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> > ---
> > include/linux/memcontrol.h | 6 ++++++
> > mm/filemap.c | 18 ++----------------
> > mm/memcontrol.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 52 insertions(+), 16 deletions(-)
>
> It's a relatively intrusive patch and I'm a bit concerned about
> feeding it into 3.2.
>
> How serious is the bug, and which kernel version(s) do you think we
> should fix it in?

This bug was added by commit ef6a3c63112e (2011 Mar), but no bug report yet.
I guess there are not many people who use memcg and FUSE at the same time
with upstream kernels.

The result of this bug is that admin cannot destroy a memcg because of
account leak. So, no panic, no deadlock. And, even if an active cgroup exist,
umount can succseed. So no problem at shutdown.

I want this fix should be merged when/after unify-lru works goes to upstream.

Thanks,
-Kame










--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/