Re: [PATCH] fix spurious OOM kills

From: Thomas Gleixner
Date: Sun Nov 21 2004 - 09:07:36 EST


On Sun, 2004-11-21 at 13:17 +0100, Martin MOKREJŠ wrote:
> Why can't the algorithm first find the asking for memory now.
> When found, kernel should kill first it's children, wait some time,
> then kill this process if still exists (it might exit itself when children
> get closed).
> You have said it's safer to kill that to send ENOMEM as happens
> in 2.4, but I still don't undertand why kernel first doesn't send
> ENOMEM, and only if that doesn't help it can start after those 5 seconds
> OOM killer, and try to kill the very same application.
> I don't get the idea why to kill immediately.

I see your concern. There are some more changes neccecary to make this
reliably work. I'm not sure if it can be done without really big
changes. I will look a bit deeper into this.

> As it has happened to me in the past, that random OOM selection has killed
> sshd or init, I believe the algorithm should be improved to not to try
> to kill these. First of all, sshd is well tested, so it will never
> be source of memleaks. Second, if the algorithm would really insist on
> killing either of these, I personally prefer it rather do clean reboot
> than a system in a state without sshd. I have to get to the console.
> Actually, it's several kilometers for me. :(

Yeah, I observed this too and therefor came up with the whom to kill and
reentrancy patch.

> It's a pitty no-one has time to at least figure out why those changes have
> exposed this stupid random part of the algorithm. Before 2.6.9-rc2
> OOM killer was also started in my tests, but it worked deterministically.
> I wouldn't prefer extra algorithm to check what we kill now, I'd rather look
> why we kill randomly since -rc2.

As I said before the random behaviour was _not_ introduced in -rc2. It
might have changed in -rc2. The random kill with overkill can also be
triggered in 2.6.7 and 2.6.8. I have not tried elder versions though.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/