Re: [PATCH] Revert oom rewrite series

From: David Rientjes
Date: Mon Nov 15 2010 - 18:50:35 EST


On Tue, 16 Nov 2010, Bodo Eggert wrote:

> > CAP_SYS_RESOURCE threads have full control over their oom killing priority
> > by /proc/pid/oom_score_adj
>
> , but unless they are written in the last months and designed for linux
> and if the author took some time to research each external process invocation,
> they can not be aware of this possibility.
>

You're clearly wrong, CAP_SYS_RESOURCE has been required to modify oom_adj
for over five years (as long as the git history). 8fb4fc68, merged into
2.6.20, allowed tasks to raise their own oom_adj but not decrease it.
That is unchanged by the rewrite.

> Besides that, if each process is supposed to change the default, the default
> is wrong.
>

That doesn't make any sense, if want to protect a thread from the oom
killer you're going to need to modify oom_score_adj, the kernel can't know
what you perceive as being vital. Having CAP_SYS_RESOURCE alone does not
imply that, it only allows unbounded access to resources. That's
completely orthogonal to the goal of the oom killer heuristic, which is to
find the most memory-hogging task to kill.

> 1) The exponential scale did have a low resolution.
>
> 2) The heuristics were developed using much brain power and much
> trial-and-error. You are going back to basics, and some people
> are not convinced that this is better. I googled and I did not
> find a discussion about how and why the new score was designed
> this way.
> looking at the output of:
> cd /proc; for a in [0-9]*; do
> echo `cat $a/oom_score` $a `perl -pes/'\0.*$'// < $a/cmdline`;
> done|grep -v ^0|sort -n |less
> , I 'm not convinced, too.
>

The old heuristics were a mixture of arbitrary values that didn't adjust
scores based on a unit and would often cause the incorrect task to be
targeted because there was no clear goal being achieved. The new
heuristic has a solid goal: to identify and kill the most memory-hogging
task that is eligible given the context in which the oom occurs. If you
disagree with that goal and want any of the old heursitics reintroduced,
please show that it makes sense in the oom killer.

> PS) Mapping an exponential value to a linear score is bad. E.g. A
> oom_adj of 8 should make an 1-MB-process as likely to kill as
> a 256-MB-process with oom_adj=0.
>

To show that, you would have to show that an application that exists today
uses an oom_adj for something other than polarization and is based on a
calculation of allowable memory usage. It simply doesn't exist.

> PS2) Because I saw this in your presentation PDF: (@udev-people)
> The -17 score of udevd is wrong, since it will even prevent
> the OOM killer from working correctly if it grows to 100 MB:
>

Threads with CAP_SYS_RESOURCE are free to lower the oom_score_adj of any
thread they deem fit and that includes applications that lower its own
oom_score_adj. The kernel isn't going to prohibit users from setting
their own oom_score_adj.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/