Re: [PATCH] mm: add config option to select the initial overcommit mode

From: Michal Hocko
Date: Tue May 17 2016 - 16:16:50 EST


On Tue 17-05-16 18:16:58, Sebastian Frias wrote:
[...]
> From reading Documentation/cgroup-v1/memory.txt (and from a few
> replies here talking about cgroups), it looks like the OOM-killer is
> still being actively discussed, well, there's also "cgroup-v2".
> My understanding is that cgroup's memory control will pause processes
> in a given cgroup until the OOM situation is solved for that cgroup,
> right?

It will be blocked waiting either for some external action which would
result in OOM codition going away or any other charge release. You have
to configure memcg for that though. The default behavior is to invoke
the same OOM killer algorithm which is just reduced to tasks from the
memcg (hierarchy).

> If that is right, it means that there is indeed a way to deal
> with an OOM situation (stack expansion, COW failure, 'memory hog',
> etc.) in a better way than the OOM-killer, right?
> In which case, do you guys know if there is a way to make the whole
> system behave as if it was inside a cgroup? (*)

No it is not. You have to realize that the system wide and the memcg OOM
situations are quite different. There is usually quite some memory free
when you hit the memcg OOM so the administrator can actually do
something. The global OOM means there is _no_ memory at all. Many kernel
operations will need some memory to do something useful. Let's say you
would want to do an educated guess about who to kill - most proc APIs
will need to allocate. And this is just a beginning. Things are getting
really nasty when you get deeper and deeper. E.g. the OOM killer has to
give the oom victim access to memory reserves so that the task can exit
because that path needs to allocate as well. So even if you wanted to
give userspace some chance to resolve the OOM situation you would either
need some special API to tell "this process is really special and it can
access memory reserves and it has an absolute priority etc." or have a
in kernel fallback to do something or your system could lockup really
easily.
--
Michal Hocko
SUSE Labs