Re: processes/threads monitor

From: Valdis . Kletnieks
Date: Sat Jan 31 2009 - 22:14:33 EST


On Sat, 31 Jan 2009 22:57:36 +0100, Angelo Borsotti said:

> The need occurs on Linux installations in which there are some process
> control applications,
> or some continuous sever applications that need some resilience
> towards process failures.
> When a process providing an essential service terminates, come
> "process monitor" would
> create it again, restoring the service automatically (and without the
> need for human intervention).

Usually, this is easily done with any sane /sbin/init, by sticking a
line in /etc/inittab that looks like:

vip:12345:respawn:/usr/bin/my-important-process

It terminates, it gets respawned.

The *tricky* case isn't dealing with a process that manages to terminate,
the real uglyness starts when you have a process that becomes wedged up
but failing to terminate (which can happen for any number of reasons - it
went into an infinite loop, or two threads managed to deadlock, or....)

And unfortunately, there's no clean general-purpose way to do *that*, mostly
because there's no good config language that allows you to code stuff like
"if Apache wedges up and fails to respond in 0.5 seconds or less, shoot it
and restart it, unless you have detected that the failure is actually due to
unrelated cause A, in which case do X, or if B happened, then do Y, or if
C happened, then do Z, or...."

If you have real requirements for all-the-time continuous operations, then
you don't *really* want a monitor anyhow. What you *want* is redundant
hardware with some good high-availability clustering software to do hot failover.

Because if your server comes to a screeching halt because a memory card or
other hardware just bit the dust, that monitor isn't going to do *anything* for you.

Plus - if you can't do failover, how do you do a system upgrade if needed?

Attachment: pgp00000.pgp
Description: PGP signature