Re: A true story of a crash.

Matt Agler (matagl@sypher.com)
Sat, 15 Aug 1998 13:58:42 -0400 (EDT)


On Sat, 15 Aug 1998 linker@z.ml.org wrote:
>
> You could also check what processes have network sockets open.
>
> You done want to kill:
[snip]
> You sigterm, and log:
[cut]
> Then, if you are still tight you sigterm:
[more cut]
> Then you kill the above.
>
> Then you wait a userdefinable time out, and either reboot the computer or
> kil everything except init and signal init to resart.
>

Hmm, doesn't that seem a bit complicated? The whole problem here is that
the computer really has no knowledge of what should and should not be
killed. You're just making elaborate guesses. The kernel can't read the
users mind to find out which process is least important. There's no
static mapping between size, priority, resource use, etc. to importance.

It would be better and simpler to let the user or admin decide what to
kill. Instead of killing a process, we should put it to sleep.

If the machine has overextended itself, we're probably swapping like mad
already. It's hammered. We're not getting anything done. We don't need
efficiency anymore. We want recovery without loosing in-process work.

For example, let's put each process, that asks for a page that we can't
give, to sleep (from do_no_page?). This would be a special sleep in that
it doesn't wakeup until we return to a certain threshold of free memory.
What would happen is that it's pages would age and get thrown out. Other
processes would complete. The load would be reduced until the machine was
recoverable.

root could login and fix the problem, add swap, kill stuff, whatever.
Voila, the kernel didn't have to read the users mind and it stayed
responsive.

Admittedly, root would need to allocate memory and so any root processes
should probably be exempt. If the box was administered right, I think
this would be a workable scheme. ext2fs does a similar thing with regards
to reserving space for root also, so there's a precident here I think.

-Matt

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html