Re: [PATCH] mm: add config option to select the initial overcommit mode

From: Michal Hocko
Date: Thu May 19 2016 - 03:14:34 EST


On Wed 18-05-16 17:18:45, Sebastian Frias wrote:
> Hi Michal,
>
> On 05/17/2016 10:16 PM, Michal Hocko wrote:
> > On Tue 17-05-16 18:16:58, Sebastian Frias wrote:
[...]
> > The global OOM means there is _no_ memory at all. Many kernel
> > operations will need some memory to do something useful. Let's say you
> > would want to do an educated guess about who to kill - most proc APIs
> > will need to allocate. And this is just a beginning. Things are getting
> > really nasty when you get deeper and deeper. E.g. the OOM killer has to
> > give the oom victim access to memory reserves so that the task can exit
> > because that path needs to allocate as well.
>
> Really? I would have thought that once that SIGKILL is sent, the
> victim process is not expected to do anything else and thus its
> memory could be claimed immediately. Or the OOM-killer is more of a
> OOM-terminator? (i.e.: sends SIGTERM)

Well, the path to exit is not exactly trivial. Resources have to be
released and that requires memory sometimes. E.g. exit_robust_list
needs to access the futex and that in turn means a page fault if the
memory was swapped out...

> >So even if you wanted to
> > give userspace some chance to resolve the OOM situation you would either
> > need some special API to tell "this process is really special and it can
> > access memory reserves and it has an absolute priority etc." or have a
> > in kernel fallback to do something or your system could lockup really
> > easily.
> >
>
> I see, so basically at least two cgroups would be needed, one reserved
> for handling the OOM situation through some API and another for the
> "rest of the system". Basically just like the 5% reserved for 'root'
> on filesystems.

If you want to handle memcg OOM then you can use memory.oom_control (see
Documentation/cgroup-v1/memory.txt for more information) and have the
oom handler outside of that memcg.

> Do you think that would work?

But handling the _global_ oom from userspace is just insane with the
current kernel implementation. It just cannot work reliably.
--
Michal Hocko
SUSE Labs