Re: [PATCH] mm: add config option to select the initial overcommit mode

From: Sebastian Frias
Date: Fri May 13 2016 - 10:35:27 EST


Hi Austin,

On 05/13/2016 03:51 PM, Austin S. Hemmelgarn wrote:
> On 2016-05-13 09:32, Sebastian Frias wrote:
>> I didn't see that in Documentation/vm/overcommit-accounting or am I looking in the wrong place?
> It's controlled by a sysctl value, so it's listed in Documentation/sysctl/vm.txt
> The relevant sysctl is vm.oom_kill_allocating_task

Thanks, I just read that.
Does not look like a replacement for overcommit=never though.

>>
>>>>
>>>> Well, it's hard to report, since it is essentially the result of a dynamic system.
>>>> I could assume it killed terminals with a long history buffer, or editors with many buffers (or big buffers).
>>>> Actually when it happened, I just turned overcommit off. I just checked and is on again on my desktop, probably forgot to make it a permanent setting.
>>>>
>>>> In the end, no processes is a good candidate for termination.
>>>> What works for you may not work for me, that's the whole point, there's a heuristic (which conceptually can never be perfect), yet the mere fact that some process has to be killed is somewhat chilling.
>>>> I mean, all running processes are supposedly there and running for a reason.
>>> OTOH, just because something is there for a reason doesn't mean it's doing what it's supposed to be. Bugs happen, including memory leaks, and if something is misbehaving enough that it impacts the rest of the system, it really should be dealt with.
>>
>> Exactly, it's just that in this case, the system is deciding how to deal with the situation by itself.
> On a busy server where uptime is critical, you can't wait for someone to notice and handle it manually, you need the issue resolved ASAP. Now, this won't always kill the correct thing, but if it's due to a memory leak, it often will work like it should.

The keyword is "'often' will work as expected".
So you are saying that it will kill a program leaking memory in what, like 90% of the cases?
I'm not sure if I would setup a server with critical uptime to have the OOM-killer enabled, do you think that'd be a good idea?

Anyway, as a side note, I just want to say thank you guys for having this discussion.
I think it is an interesting thread and hopefully it will advance the "knowledge" about this setting.

>>
>>>
>>> This brings to mind a complex bug involving Tor and GCC whereby building certain (old) versions of Tor with certain (old) versions of GCC with -Os would cause an infinite loop in GCC. You obviously have GCC running for a reason, but that doesn't mean that it's doing what it should be.
>>
>> I'm not sure if I followed the analogy/example, but are you saying that the OOM-killer killed GCC in your example?
>> This seems an odd example though, I mean, shouldn't the guy in front of the computer notice the loop and kill GCC by himself?
> No, I didn't mean as an example of the OOM killer, I just meant as an example of software not doing what it should. It's not as easy to find an example for the OOM killer, so I don't really have a good example. The general concept is the same though, the only difference is there isn't a kernel protection against infinite loops (because they aren't always bugs, while memory leaks and similar are).

So how does the kernel knows that a process is "leaking memory" as opposed to just "using lots of memory"? (wouldn't that be comparable to answering how does the kernel knows the difference between an infinite loop and one that is not?)

Best regards,

Sebastian