Re: [RFC PATCH] memcg: expose children memory usage for root

From: Yosry Ahmed
Date: Fri Jul 26 2024 - 12:26:20 EST


On Fri, Jul 26, 2024 at 8:48 AM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
>
> On Thu, Jul 25, 2024 at 04:20:45PM GMT, Yosry Ahmed wrote:
> > On Mon, Jul 22, 2024 at 3:53 PM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
> > >
> > > Linux kernel does not expose memory.current on the root memcg and there
> > > are applications which have to traverse all the top level memcgs to
> > > calculate the total memory charged in the system. This is more expensive
> > > (directory traversal and multiple open and reads) and is racy on a busy
> > > machine. As the kernel already have the needed information i.e. root's
> > > memory.current, why not expose that?
> > >
> > > However root's memory.current will have a different semantics than the
> > > non-root's memory.current as the kernel skips the charging for root, so
> > > maybe it is better to have a different named interface for the root.
> > > Something like memory.children_usage only for root memcg.
> > >
> > > Now there is still a question that why the kernel does not expose
> > > memory.current for the root. The historical reason was that the memcg
> > > charging was expensice and to provide the users to bypass the memcg
> > > charging by letting them run in the root. However do we still want to
> > > have this exception today? What is stopping us to start charging the
> > > root memcg as well. Of course the root will not have limits but the
> > > allocations will go through memcg charging and then the memory.current
> > > of root and non-root will have the same semantics.
> > >
> > > This is an RFC to start a discussion on memcg charging for root.
> >
> > I vaguely remember when running some netperf tests (tcp_rr?) in a
> > cgroup that the performance decreases considerably with every level
> > down the hierarchy. I am assuming that charging was a part of the
> > reason. If that's the case, charging the root will be similar to
> > moving all workloads one level down the hierarchy in terms of charging
> > overhead.
>
> No, the workloads running in non-root memcgs will not see any
> difference. Only the workloads running in root will see charging
> overhead.

Oh yeah we already charge the root's page counters hierarchically in
the upstream kernel, we just do not charge them if the origin of the
charge is the root itself.

We also have workloads that iterate top-level memcgs to calculate the
total charged memory, so memory.children_usage for the root memcg
would help.

As for memory.current, do you have any data about how much memory is
charged to the root itself? We think of the memory charged to the root
as system overhead, while the memory charged to top-level memcgs
isn't.

So basically total_memory - root::memory.children_usage would be a
fast way to get a rough estimation of system overhead. The same would
not apply for total_memory - root::memory.current if I understand
correctly.