Re: [PATCH v3 4/5] mm: memcg: charge memcg percpu memory to the parent cgroup

From: Roman Gushchin
Date: Mon Aug 10 2020 - 15:33:50 EST


On Thu, Aug 06, 2020 at 09:37:17PM -0700, Roman Gushchin wrote:
> On Thu, Aug 06, 2020 at 09:16:03PM -0700, Andrew Morton wrote:
> > On Wed, 29 Jul 2020 19:10:39 +0200 Michal Koutný <mkoutny@xxxxxxxx> wrote:
> >
> > > Hello.
> > >
> > > On Tue, Jun 23, 2020 at 11:45:14AM -0700, Roman Gushchin <guro@xxxxxx> wrote:
> > > > Because the size of memory cgroup internal structures can dramatically
> > > > exceed the size of object or page which is pinning it in the memory, it's
> > > > not a good idea to simple ignore it. It actually breaks the isolation
> > > > between cgroups.
> > > No doubt about accounting the memory if it's significant amount.
> > >
> > > > Let's account the consumed percpu memory to the parent cgroup.
> > > Why did you choose charging to the parent of the created cgroup?
> > >
> > > Should the charge go the cgroup _that is creating_ the new memcg?
> > >
> > > One reason is that there are the throttling mechanisms for memory limits
> > > and those are better exercised when the actor and its memory artefact
> > > are the same cgroup, aren't they?
>
> Hi!
>
> In general, yes. But in this case I think it wouldn't be a good idea:
> most often cgroups are created by a centralized daemon (systemd),
> which is usually located in the root cgroup. Even if it's located not in
> the root cgroup, limiting it's memory will likely affect the whole system,
> even if only one specific limit was reached.
> If there is a containerized workload, which creates sub-cgroups,
> charging it's parent cgroup is perfectly effective.
>
> And the opposite, if we'll charge the cgroup of a process, who created
> a cgroup, we'll not cover the most common case: systemd creating
> cgroups for all services in the system.
>
> > >
> > > The second reason is based on the example Dlegation Containment
> > > (Documentation/admin-guide/cgroup-v2.rst)
> > >
> > > > For an example, let's assume cgroups C0 and C1 have been delegated to
> > > > user U0 who created C00, C01 under C0 and C10 under C1 as follows and
> > > > all processes under C0 and C1 belong to U0::
> > > >
> > > > ~~~~~~~~~~~~~ - C0 - C00
> > > > ~ cgroup ~ \ C01
> > > > ~ hierarchy ~
> > > > ~~~~~~~~~~~~~ - C1 - C10
> > >
> > > Thanks to permissions a task running in C0 creating a cgroup in C1 would
> > > deplete C1's supply victimizing tasks inside C1.
>
> Right, but it's quite unusual for tasks from one cgroup to create sub-cgroups
> in completely different cgroup. In this particular case there are tons of other
> ways how a task from C00 can hurt C1.
>
> >
> > These week-old issues appear to be significant. Roman? Or someone
> > else?
>
> Oh, I'm sorry, somehow I've missed this letter.
> Thank you for pointing at it!

Hello, Michal!

Do you have concerns left here or it's good to go?

It seems that this blocking the whole percpu accounting patchset from being merged,
and I still hope it can be squeezed into 5.9.

Thank you!

Roman