Re: [PATCH] mm: add config option to select the initial overcommit mode

From: Austin S. Hemmelgarn
Date: Fri May 13 2016 - 11:02:31 EST


On 2016-05-13 10:23, Sebastian Frias wrote:
Hi Austin,

On 05/13/2016 04:14 PM, Austin S. Hemmelgarn wrote:
On 2016-05-13 09:34, Sebastian Frias wrote:
Hi Austin,

On 05/13/2016 03:11 PM, Austin S. Hemmelgarn wrote:
On 2016-05-13 08:39, Sebastian Frias wrote:

My point is that it seems to be possible to deal with such conditions in a more controlled way, ie: a way that is less random and less abrupt.
There's an option for the OOM-killer to just kill the allocating task instead of using the scoring heuristic. This is about as deterministic as things can get though.

By the way, why does it has to "kill" anything in that case?
I mean, shouldn't it just tell the allocating task that there's not enough memory by letting malloc return NULL?
In theory, that's a great idea. In practice though, it only works if:
1. The allocating task correctly handles malloc() (or whatever other function it uses) returning NULL, which a number of programs don't.
2. The task actually has fallback options for memory limits. Many programs that do handle getting a NULL pointer from malloc() handle it by exiting anyway, so there's not as much value in this case.
3. There isn't a memory leak somewhere on the system. Killing the allocating task doesn't help much if this is the case of course.

Well, the thing is that the current behaviour, i.e.: overcommiting, does not improves the quality of those programs.
I mean, what incentive do they have to properly handle situations 1, 2?
Overcommit got introduced because of these, not the other way around. It's not forcing them to change, but it's also a core concept in any modern virtual memory based OS, and that's not ever going to change either.

You also have to keep in mind that most apps aren't doing this intentionally. There are three general reasons they do this:
1. They don't know how much memory they will need, so they guess high because malloc() is computationally expensive. This is technically intentional, but it's also something that can't be avoided in some cases Dropbox is a perfect example of this taken way too far (they also take the concept of a thread pool too far).
2. The program has a lot of code that isn't frequently run. It makes no sense to keep code that isn't used in RAM, so it gets either dropped (if it's unmodified), or it gets swapped out. Most of the programs that I see on my system fall into this category (acpid for example just sleeps until an ACPI event happens, so it usually won't have most of it's code in memory on a busy system).
3. The application wants to do it's own memory management. This is common on a lot of HPC apps and some high performance server software.

Also, if there's a memory leak, the termination of any task, whether it is the allocating task or something random, does not help either, the system will eventually go down, right?
If the memory leak is in the kernel, then yes, the OOM killer won't help, period. But if the memory leak is in userspace, and the OOM killer kills the task with the leak (which it usually will if you don't have it set to kill the allocating task), then it may have just saved the system from crashing completely. Yes some user may lose some unsaved work, but they would lose that data anyway if the system crashes, and they can probably still use the rest of the system.
You have to keep in mind though, that on a properly provisioned system, the only situations where the OOM killer should be invoked are when there's a memory leak, or when someone is intentionally trying to DoS the system through memory exhaustion.

Exactly, the DoS attack is another reason why the OOM-killer does not seem a good idea, at least compared to just letting malloc return NULL and let the program fail.
Because of overcommit, it's possible for the allocation to succeed, but the subsequent access to fail. At that point, you're way past malloc() returning, and you have to do something.

Also, returning NULL on a failed malloc() provides zero protection against all but the most brain-dead memory exhaustion based DoS attacks. The general core of a memory exhaustion DoS against a local system follows a simple three step procedure:
1. Try to allocate a small chunk of memory (less than or equal to page size)
2. If the allocation succeeded, write to the first byte of that chunk of memory, forcing actual allocation
3. Repeat indefinitely from step 1
Step 2 is the crucial part here, if you don't write to the memory, it will only eat up your own virtual address space. If you don't check for a NULL pointer and skip writing, you get a segfault. If the OOM killer isn't invoked in such a situation, then this will just eat up all the free system memory, and then _keep running_ and eat up all the other memory as it's freed by other things exiting due to lack of memory.