Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!

From: Balbir Singh
Date: Tue Jan 13 2009 - 10:00:32 EST


On Tue, Jan 13, 2009 at 7:54 PM, Evgeniy Polyakov <zbr@xxxxxxxxxxx> wrote:
> On Tue, Jan 13, 2009 at 02:06:27PM +0000, Alan Cox (alan@xxxxxxxxxxxxxxxxxxx) wrote:
>> > Do you _REALLY_ think anyone can calculate it yourself and then properly
>> > calculate adjustment used to properly select oom-killed process?
>>
>> Its always a heuristic.
>
> For the system which knows what it is. User does not and really can not
> work with it, since there is no sane way to implement that heuristic in
> the applications or even in (theoretically possible) monitor daemon.
>
> So, effectively, oom adjustment does not work.
>
>> > So far my patch is the sanest way to deal with the OOM selection
>>
>> No. You keep maintaining this but your crude hack is useless in a non
>> co-operative environment, has lots of issue with name aliasing and
>> doesn't deal with real needs.
>
> It is created because of real needs. Because people need to control the
> behaviour of the system and they want to control which application will
> be killed to free the memory. Attached patch is not the best solution,
> but it works for the all cases I can think about.
>

Where does this end? Tomorrow you'll add an interface for applications
that should *not* be killed? What sort of a heuristic is name? I think
the only name the kernel knows about is "init".

> Let's take you 'name aliasing' claim: if there are several processes
> with the same name, system will select the one with the worst score
> according to the own magical algorithm. So it will not kill random
> process just because it happend to have ricky name.
>

Having a name in the kernel is like building a hit-list, why can't the
examples that Alan sent work for you?
Names are tricky as well, if someone used a symbolic link to the
application with a different name, they would no longer be candidates
for OOM first? or vice-versa?

> And the same applies to the other issues. It just helps system to select
> the process to be killed according to userspace expectation of what
> should be killed to free the memory.
>
>> We have container interfaces that can do this and far more and do them
>> right. In fact the very start of all the OpenVZ and container work years
>> ago was the beancounter patches which were addressed at exactly this
>> problem (although more specifically 'making sure undergraduates processes
>> get killed first')
>
> Are the beancounters used to limit amount of virtual ram and not the
> physical one? It really does not work to limit for example some java
> machine which will ate all virtual space swapping out different node.
> It works for some (and likely the most, I do not argue this) cases and
> has overhead. But we are talking not about how to limit the processes,
> but what to do when we happend to have out-of-memory condition. And it
> happens all the time even if you put the processes into the separate
> container, since there are situations (that's why it was started at
> first), when you have a huge process which should not be killed and set
> of either its children or external processes, which should be checked
> and some of them (administrator would like to specify the less
> important) should be killed without much harm to the system.
>
> And patch I presented allows to do it. It introduces a hint for the
> killer on what processes should be checked first. It works exactly the
> way people work with their system: they run different application and
> expect some of them to be higher or lower priority when things come to
> the oom condition. No one ever proposes to kill exactly the process we
> select (although that may be a good idea in some cases), but instead to
> show that oom-killer should check given group first. The group
> administrator knows to be potentially harmless.
>

You can replace the lines of kernel code you wrote with a simple
one-line script that Alan sent out.

Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/