Re: [BUGFIX][PATCH] oom-kill: fix NUMA consraint check with nodemaskv4.2

From: David Rientjes
Date: Fri Dec 18 2009 - 05:05:17 EST


On Fri, 18 Dec 2009, KOSAKI Motohiro wrote:

> > That is contrast to using rss as a baseline where we prefer on killing the
> > application with the most resident RAM. It is not always ideal to kill a
> > task with 8GB of rss when we fail to allocate a single page for a low
> > priority task.
>
> VSZ has the same problem if low priority task allocate last single page.
>

I don't understand what you're trying to say, sorry. Why, in your mind,
do we always want to prefer to kill the application with the largest
amount of memory present in physical RAM for a single, failed order-0
allocation attempt from a lower priority task?

Additionally, when would it be sufficient to simply fail a ~__GFP_NOFAIL
allocation instead of killing anything?

> yes, possible. however its heuristic is intensional. the code comment says:
>
> /*
> * If p's nodes don't overlap ours, it may still help to kill p
> * because p may have allocated or otherwise mapped memory on
> * this node before. However it will be less likely.
> */
>
> do you have alternative plan? How do we know the task don't have any
> page in memory busted node? we can't add any statistics for oom because
> almost systems never ever use oom. thus, many developer oppose such slowdown.
>

There's nothing wrong with that currently (except it doesn't work for
mempolicies), I'm stating that it is a requirement that we keep such a
penalization in our heuristic if we plan on rewriting it. I was
attempting to get a list of requirements for oom killing decisions so that
we can write a sane heuristic and you're simply defending the status quo
which you insist we should change.

> > We need to be able to polarize tasks so they are always killed regardless
> > of any kernel heuristic (/proc/pid/oom_adj of +15, currently) or always
> > chosen last (-16, currently). We also need a way of completely disabling
> > oom killing for certain tasks such as with OOM_DISABLE.
>
> afaik, when admin use +15 or -16 adjustment, usually they hope to don't use
> kernel heuristic.

That's exactly what I said above.

> This is the reason that I proposed /proc/pid/oom_priority
> new tunable knob.
>

In addition to /proc/pid/oom_adj?? oom_priority on it's own does not
allow us to define when a task is a memory leaker based on the expected
memory consumption of a single application. That should be the single
biggest consideration in the new badness heuristic: to define when a task
should be killed because it is rogue.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/