Re: [PATCH] mm: add config option to select the initial overcommit mode

From: Austin S. Hemmelgarn
Date: Fri May 13 2016 - 09:51:57 EST


On 2016-05-13 09:32, Sebastian Frias wrote:
Hi Austin,

On 05/13/2016 03:11 PM, Austin S. Hemmelgarn wrote:
On 2016-05-13 08:39, Sebastian Frias wrote:
Well, a more urgent problem would be that in that case overcommit=never is not really well tested.
I know more people who use overcommit=never than overcommit=always. I use it myself on all my personal systems, but I also allocate significant amounts of swap space (usually 64G, but I also have a big disks in my systems and don't often hit swap), don't use Java, and generally don't use a lot of the more wasteful programs either (many of them on desktop systems tend to be stuff like office software). I know a number of people who use overcommit=never on their servers and give them a decent amount of swap space (and again, don't use Java).

Then I'll look into LTP and the issues it has when overcommit=never.


My point is that it seems to be possible to deal with such conditions in a more controlled way, ie: a way that is less random and less abrupt.
There's an option for the OOM-killer to just kill the allocating task instead of using the scoring heuristic. This is about as deterministic as things can get though.

I didn't see that in Documentation/vm/overcommit-accounting or am I looking in the wrong place?
It's controlled by a sysctl value, so it's listed in Documentation/sysctl/vm.txt
The relevant sysctl is vm.oom_kill_allocating_task


Well, it's hard to report, since it is essentially the result of a dynamic system.
I could assume it killed terminals with a long history buffer, or editors with many buffers (or big buffers).
Actually when it happened, I just turned overcommit off. I just checked and is on again on my desktop, probably forgot to make it a permanent setting.

In the end, no processes is a good candidate for termination.
What works for you may not work for me, that's the whole point, there's a heuristic (which conceptually can never be perfect), yet the mere fact that some process has to be killed is somewhat chilling.
I mean, all running processes are supposedly there and running for a reason.
OTOH, just because something is there for a reason doesn't mean it's doing what it's supposed to be. Bugs happen, including memory leaks, and if something is misbehaving enough that it impacts the rest of the system, it really should be dealt with.

Exactly, it's just that in this case, the system is deciding how to deal with the situation by itself.
On a busy server where uptime is critical, you can't wait for someone to notice and handle it manually, you need the issue resolved ASAP. Now, this won't always kill the correct thing, but if it's due to a memory leak, it often will work like it should.


This brings to mind a complex bug involving Tor and GCC whereby building certain (old) versions of Tor with certain (old) versions of GCC with -Os would cause an infinite loop in GCC. You obviously have GCC running for a reason, but that doesn't mean that it's doing what it should be.

I'm not sure if I followed the analogy/example, but are you saying that the OOM-killer killed GCC in your example?
This seems an odd example though, I mean, shouldn't the guy in front of the computer notice the loop and kill GCC by himself?
No, I didn't mean as an example of the OOM killer, I just meant as an example of software not doing what it should. It's not as easy to find an example for the OOM killer, so I don't really have a good example. The general concept is the same though, the only difference is there isn't a kernel protection against infinite loops (because they aren't always bugs, while memory leaks and similar are).