Re: Memory overcommit

From: David Rientjes
Date: Fri Oct 30 2009 - 05:10:54 EST


On Fri, 30 Oct 2009, KAMEZAWA Hiroyuki wrote:

> As I wrote repeatedly,
>
> - OOM-Killer itselfs is bad thing, bad situation.

Not necessarily, the memory controller and cpusets uses it quite often to
enforce it's policy and is standard runtime behavior. We'd like to
imagine that our cpuset will never be too small to run all the attached
jobs, but that happens and we can easily recover from it by killing a
task.

> - The kernel can't know the program is bad or not. just guess it.

Totally irrelevant, given your fourth point about /proc/pid/oom_adj. We
can tell the kernel what we'd like the oom killer behavior should be if
the situation arises.

> - Then, there is no "correct" OOM-Killer other than fork-bomb killer.

Well of course there is, you're seeing this is a WAY too simplistic
manner. If we are oom, we want to be able to influence how the oom killer
behaves and respond to that situation. You are proposing that we change
the baseline for how the oom killer selects tasks which we use CONSTANTLY
as part of our normal production environment. I'd appreciate it if you'd
take it a little more seriously.

> - User has a knob as oom_adj. This is very strong.
>

Agreed.

> Then, there is only "reasonable" or "easy-to-understand" OOM-Kill.
> "Current biggest memory eater is killed" sounds reasonable, easy to
> understand. And if total_vm works well, overcommit_guess should catch it.
> Please improve overcommit_guess if you want to stay on total_vm.
>

I don't necessarily want to stay on total_vm, but I also don't want to
move to rss as a baseline, as you would probably agree.

We disagree about a very fundamental principle: you are coming from a
perspective of always wanting to kill the biggest resident memory eater
even for a single order-0 allocation that fails and I'm coming from a
perspective of wanting to ensure that our machines know how the oom killer
will react when it is used. Moving to rss reduces the ability of the user
to specify an expected oom priority other than polarizing it by either
disabling it completely with an oom_adj value of -17 or choosing the
definite next victim with +15. That's my objection to it: the user cannot
possibly be expected to predict what proportion of each application's
memory will be resident at the time of oom.

I understand you want to totally rewrite the oom killer for whatever
reason, but I think you need to spend a lot more time understanding the
needs that the Linux community has for its behavior instead of insisting
on your point of view.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/