Re: [PATCH] memcg: expose root cgroup's memory.stat

From: Chris Down
Date: Fri May 15 2020 - 21:43:01 EST


Hey Shakeel,

Shakeel Butt writes:
One way to measure the efficiency of memory reclaim is to look at the
ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are
not updated consistently at the system level and the ratio of these are
not very meaningful. The pgsteal and pgscan are updated for only global
reclaim while pgrefill gets updated for global as well as cgroup
reclaim.

Please note that this difference is only for system level vmstats. The
cgroup stats returned by memory.stat are actually consistent. The
cgroup's pgsteal contains number of reclaimed pages for global as well
as cgroup reclaim. So, one way to get the system level stats is to get
these stats from root's memory.stat, so, expose memory.stat for the root
cgroup.

from Johannes Weiner:
There are subtle differences between /proc/vmstat and
memory.stat, and cgroup-aware code that wants to watch the full
hierarchy currently has to know about these intricacies and
translate semantics back and forth.

Generally having the fully recursive memory.stat at the root
level could help a broader range of usecases.

Signed-off-by: Shakeel Butt <shakeelb@xxxxxxxxxx>
Suggested-by: Johannes Weiner <hannes@xxxxxxxxxxx>

The patch looks great, thanks. To that extent you can add my ack:

Acked-by: Chris Down <chris@xxxxxxxxxxxxxx>

One concern about the API now exposed, though: to a new cgroup v2 user this looks fairly dodgy as a sole stat file (except for cgroup.stat) at the root. If I used cgroup v2 for the first time and only saw memory.stat and cgroup.stat there, but for some reason io.stat and cpu.stat are not available at the root but are available everywhere else, I think my first thought would be that the cgroup v2 developers must have been on some strong stuff when they came up with this ;-)

Even if they're only really duplicating information available elsewhere right now, have you considered exposing the rest of the stat files as well so that the API maintains a bit more consistency? As a bonus, that also means userspace applications can parse in the same way from the root down.