Re: [PATCH] memcg: deprecate memory.force_empty knob

From: Michal Hocko
Date: Mon May 19 2014 - 10:02:54 EST


On Fri 16-05-14 15:00:16, Greg Thelen wrote:
> On Tue, May 13 2014, Michal Hocko <mhocko@xxxxxxx> wrote:
[...]
> > If somebody really cares because reparented pages, which would be
> > dropped otherwise, push out more important ones then we should fix the
> > reparenting code and put pages to the tail.
>
> I should mention a case where I've needed to use memory.force_empty: to
> synchronously flush stats from child to parent. Without force_empty
> memory.stat is temporarily inconsistent until async css_offline
> reparents charges. Here is an example on v3.14 showing that
> parent/memory.stat contents are in-flux immediately after rmdir of
> parent/child.

OK, it is true that the delayed offlining makes this little bit
complicated because there is no direct user visible relation between
rmdir and css_offline.

> $ cat /test
> #!/bin/bash
>
> # Create parent and child. Add some non-reclaimable anon rss to child,
> # then move running task to parent.
> mkdir p p/c
> (echo $BASHPID > p/c/cgroup.procs && exec sleep 1d) &
> pid=$!
> sleep 1
> echo $pid > p/cgroup.procs
>
> grep 'rss ' {p,p/c}/memory.stat
> if [[ $1 == force ]]; then
> echo 1 > p/c/memory.force_empty
> fi
> rmdir p/c
>
> echo 'For a small time the p/c memory has not been reparented to p.'
> grep 'rss ' {p,p/c}/memory.stat
>
> sleep 1
> echo 'After waiting all memory has been reparented'
> grep 'rss ' {p,p/c}/memory.stat
>
> kill $pid
> rmdir p
>
>
> -- First, demonstrate that just rmdir, without memory.force_empty,
> temporarily hides reparented child memory stats.
>
> $ /test
> p/memory.stat:rss 0
> p/memory.stat:total_rss 69632
> p/c/memory.stat:rss 69632
> p/c/memory.stat:total_rss 69632
> For a small time the p/c memory has not been reparented to p.
> p/memory.stat:rss 0
> p/memory.stat:total_rss 0

OK, this is a bug. Our iterators skip the children because css_tryget
fails on it but css_offline still not done. This is fixable, though,
and force_empty is just a workaround so I wouldn't see this as a proper
justification to keep it alive.

One possible way to fix this is to iterate children even when css_tryget
fails for them if they haven't finished css_offline yet.
There are some changes in the cgroups core which should make this easier
and Johannes claimed he has some work in that area.

Anyway this is a useful testcase. Thanks Greg!

> grep: p/c/memory.stat: No such file or directory
> After waiting all memory has been reparented
> p/memory.stat:rss 69632
> p/memory.stat:total_rss 69632
> grep: p/c/memory.stat: No such file or directory
> /test: Terminated ( echo $BASHPID > p/c/cgroup.procs && exec sleep 1d )
>
> -- Demonstrate that using memory.force_empty before rmdir, behaves more
> sensibly. Stats for reparented child memory are not hidden.
>
> $ /test force
> p/memory.stat:rss 0
> p/memory.stat:total_rss 69632
> p/c/memory.stat:rss 69632
> p/c/memory.stat:total_rss 69632
> For a small time the p/c memory has not been reparented to p.
> p/memory.stat:rss 69632
> p/memory.stat:total_rss 69632
> grep: p/c/memory.stat: No such file or directory
> After waiting all memory has been reparented
> p/memory.stat:rss 69632
> p/memory.stat:total_rss 69632
> grep: p/c/memory.stat: No such file or directory
> /test: Terminated ( echo $BASHPID > p/c/cgroup.procs && exec sleep 1d )

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/