Re: [PATCH] memcg: Ignore unprotected parent in mem_cgroup_protected()

From: Chris Down
Date: Sat Jun 15 2019 - 12:13:14 EST


Hi Xunlei,

Xunlei Pang writes:
Currently memory.min|low implementation requires the whole
hierarchy has the settings, otherwise the protection will
be broken.

Our hierarchy is kind of like(memory.min value in brackets),

root
|
docker(0)
/ \
c1(max) c2(0)

Note that "docker" doesn't set memory.min. When kswapd runs,
mem_cgroup_protected() returns "0" emin for "c1" due to "0"
@parent_emin of "docker", as a result "c1" gets reclaimed.

But it's hard to maintain parent's "memory.min" when there're
uncertain protected children because only some important types
of containers need the protection. Further, control tasks
belonging to parent constantly reproduce trivial memory which
should not be protected at all. It makes sense to ignore
unprotected parent in this scenario to achieve the flexibility.

I'm really confused by this, why don't you just set memory.{min,low} in the docker cgroup and only propagate it to the children that want it?

If you only want some children to have the protection, only request it in those children, or create an additional intermediate layer of the cgroup hierarchy with protections further limited if you don't trust the task to request the right amount.

Breaking the requirement for hierarchical propagation of protections seems like a really questionable API change, not least because it makes it harder to set systemwide policies about the constraints of protections within a subtree.