Re: [PATCH 1/4] mm: Trial do_wp_page() simplification

From: Michal Hocko
Date: Mon Sep 21 2020 - 10:28:42 EST


On Mon 21-09-20 10:18:30, Peter Xu wrote:
> Hi, Michal,
>
> On Mon, Sep 21, 2020 at 03:42:00PM +0200, Michal Hocko wrote:
[...]
> > I have only now
> > learned about this feature so I am not deeply familiar with all the
> > details and I might be easily wrong. Normally all the cgroup aware
> > resources are accounted to the parent's cgroup. For memcg that includes
> > all the page tables, early CoW and other allocations with __GFP_ACCOUNT.
> > IIUC CLONE_INTO_CGROUP properly then this hasn't changed as the child is
> > associated to its new cgroup (and memcg) only in cgroup_post_fork. If
> > that is correct then we might have quite a lot of resources bound to
> > child's lifetime but accounted to the parent's memcg which can lead to
> > all sorts of interesting problems (e.g. unreclaimable memory - even by
> > the oom killer).
>
> Right that's one of the things that I'm confused too, on that if we always
> account to the parent, then when the child quits whether we uncharge them or
> not, and how.. Not sure whether the accounting of the parent could steadily
> grow as it continues the fork()s.
>
> So is it by design that we account all these to the parents?

Let me try to clarify a bit further my concern. Without CLONE_INTO_CGROUP
this makes some sense. Because both parent and child will live in
the same cgroup. All the charges are reference counted so they will
be released when the respective resource gets freed (e.g. page table
released or the backing page dropped) irrespective of the current cgroup
the owner is living in.

Fundamentaly CLONE_INTO_CGROUP is similar to regular fork + move to the
target cgroup after the child gets executed. So in principle there
shouldn't be any big difference. Except that the move has to be explicit
and the the child has to have enough privileges to move itself. I am not
completely sure about CLONE_INTO_CGROUP model though. According to man
clone(2) it seems that O_RDONLY for the target cgroup directory is
sufficient. That seems much more relaxed IIUC and it would allow to fork
into a different cgroup while keeping a lot of resources in the parent's
proper.
--
Michal Hocko
SUSE Labs