Re: Improving OOM killer

From: David Rientjes
Date: Wed Feb 03 2010 - 14:52:37 EST


On Wed, 3 Feb 2010, Frans Pop wrote:

> > * /proc/pid/oom_adj ranges from -1000 to +1000 to either
> > * completely disable oom killing or always prefer it.
> > */
> > points += p->signal->oom_adj;
> >
>
> Wouldn't that cause a rather huge compatibility issue given that the
> current oom_adj works in a totally different way:
>
> ! 3.1 /proc/<pid>/oom_adj - Adjust the oom-killer score
> ! ------------------------------------------------------
> ! This file can be used to adjust the score used to select which processes
> ! should be killed in an out-of-memory situation. Giving it a high score
> ! will increase the likelihood of this process being killed by the
> ! oom-killer. Valid values are in the range -16 to +15, plus the special
> ! value -17, which disables oom-killing altogether for this process.
>
> ?
>

I thought about whether we'd need an additional, complementary tunable
such as /proc/pid/oom_bias that would effect this new memory-charging bias
in the heuristic. It could be implemented so that writing to oom_adj
would clear oom_bias and vice versa.

Although that would certainly be possible, I didn't propose it for a
couple of reasons:

- it would clutter the space to have two seperate tunables when the
metrics that /proc/pid/oom_adj uses has become obsolete by the new
baseline as a fraction of total RAM, and

- we have always exported OOM_DISABLE, OOM_ADJUST_MIN, and OOM_ADJUST_MAX
via include/oom.h so that userspace should use them sanely. Setting
a particular oom_adj value for anything other than OOM_DISABLE means
the score will be relative to other system tasks, so its a value that
is typically calibrated at runtime rather than static, hardcoded
values.

We could reuse /proc/pid/oom_adj for the new heuristic by severely
reducing its granularity than it otherwise would by doing
(oom_adj * 1000 / OOM_ADJUST_MAX), but that will eventually become
annoying and much more difficult to document.

Given your citation, I don't think we've ever described /proc/pid/oom_adj
outside of the implementation as a bitshift, either. So its use right now
for anything other than OOM_DISABLE is probably based on scalar thinking.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/