Re: A true story of a crash.

Theodore Y. Ts'o (tytso@mit.edu)
Fri, 14 Aug 1998 23:44:49 -0400


Date: Fri, 14 Aug 1998 13:15:26 -0500
From: Ian and Iris <brooke@mail.jump.net>

After some thought, you consider that fork-bombs are nowhere near as
common on a relatively well-behaived "Personal" system as is running
out of memory. Thus, it makes sense to kill the largest process not
owned by root unless there are no more, then the largest process
owned by root as long as it's not init, then just give up on the
theory that if init wants to take down the system there are other,
larger problems.

Why largest? It's probably the out-of-control one.

There's only one problem with this strategy --- which was originally
used by AIX, by the way. If the largest process happens to be the X
server, and the reason why you're out of memory was because you have
lots and lots of (smaller) X programs running, the kernel will kill off
the X server, which will keep the system up and free lots of memory
(since not only will the X server exit, but all of the X client
applications will die too!).

However, users might not find that to be the most reasonable behaviour,
since they might lose a lot of work, and the server clearly killed many
more processes than it needed to.

Granted that it would be nice to make Linux handle this situation more
gracefully, but in general this is a very, very, hard problem to handle
"correctly" in all cases. In my view, the general case solution is that
you should never let yourself get that badly overcommitted. For
performance reasons, I usually like to make sure I have enough memory so
that all or most of the time, everything I need is in core, and I don't
need to be swapping at all. The swap space I then use for the emergency
cases when I need slightly more memory than I have --- and I never let
myself get near the "redline" case at all.

The other strategy which probably works better is to kill off the
process which tried asking for memory when the kernel had trouble
servicing its request. This has the advantage that you avoid killing
the long-term, stable processes that aren't requested new pages, even if
they've are pretty big. Like your solution, it's an attempt to try to
kill off the out-of-control process, while avoiding the "benign"
processes.

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html