Re: [PATCH] mm: add config option to select the initial overcommit mode

From: Sebastian Frias
Date: Wed May 18 2016 - 11:19:00 EST


Hi Michal,

On 05/17/2016 10:16 PM, Michal Hocko wrote:
> On Tue 17-05-16 18:16:58, Sebastian Frias wrote:
> [...]
>> From reading Documentation/cgroup-v1/memory.txt (and from a few
>> replies here talking about cgroups), it looks like the OOM-killer is
>> still being actively discussed, well, there's also "cgroup-v2".
>> My understanding is that cgroup's memory control will pause processes
>> in a given cgroup until the OOM situation is solved for that cgroup,
>> right?
>
> It will be blocked waiting either for some external action which would
> result in OOM codition going away or any other charge release. You have
> to configure memcg for that though. The default behavior is to invoke
> the same OOM killer algorithm which is just reduced to tasks from the
> memcg (hierarchy).

Ok, I see, thanks!

>
>> If that is right, it means that there is indeed a way to deal
>> with an OOM situation (stack expansion, COW failure, 'memory hog',
>> etc.) in a better way than the OOM-killer, right?
>> In which case, do you guys know if there is a way to make the whole
>> system behave as if it was inside a cgroup? (*)
>
> No it is not. You have to realize that the system wide and the memcg OOM
> situations are quite different. There is usually quite some memory free
> when you hit the memcg OOM so the administrator can actually do
> something.

Ok, so it works like the 5% reserved for 'root' on filesystems?

>The global OOM means there is _no_ memory at all. Many kernel
> operations will need some memory to do something useful. Let's say you
> would want to do an educated guess about who to kill - most proc APIs
> will need to allocate. And this is just a beginning. Things are getting
> really nasty when you get deeper and deeper. E.g. the OOM killer has to
> give the oom victim access to memory reserves so that the task can exit
> because that path needs to allocate as well.

Really? I would have thought that once that SIGKILL is sent, the victim process is not expected to do anything else and thus its memory could be claimed immediately.
Or the OOM-killer is more of a OOM-terminator? (i.e.: sends SIGTERM)

>So even if you wanted to
> give userspace some chance to resolve the OOM situation you would either
> need some special API to tell "this process is really special and it can
> access memory reserves and it has an absolute priority etc." or have a
> in kernel fallback to do something or your system could lockup really
> easily.
>

I see, so basically at least two cgroups would be needed, one reserved for handling the OOM situation through some API and another for the "rest of the system".
Basically just like the 5% reserved for 'root' on filesystems.
Do you think that would work?

Best regards,

Sebastian