Re: Andrea VM changes

From: Jonathan Lundell
Date: Sun Aug 31 2003 - 14:14:01 EST


At 8:51am -0700 8/31/03, Dan Kegel wrote:
In the test-and-measurement system I'm developing,
it turned out the sanest thing to do with OOM conditions
was to consider them user errors, and to handle them
by dumping memory usage info about processes and slab caches,
then halt. It's been very helpful because it turns flaky
conditions into rock-solid failures. Too bad this drastic
approach isn't appropriate for general use.

Likewise in an HA environment, if you've got a standby node available, we prefer to fail over on an oom condition or (or an oops, for that matter) than to try to continue running in some randomly crippled way. The node in question can then reboot and return to service as a standby.

Ideally, we'd have a notifier that would be triggered for every unanticipated process kill (oom, oops, whatever).
--
/Jonathan Lundell.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/