Re: I Give Up!

Dale Amon (amon@vnl.com)
Thu, 23 Dec 1999 21:19:41 +0000 (GMT)


Over the years I've seen problems like this crop up.
It's a memory leak and it could be in the kernel or it could
be in one of your own user land programs. I'm currently
battling one in a Perl sourced daemon that leaks
enough to go into a swap thrashing lockup every 6 days.
We fixed it temporarily with a 6 day cron job reboot...

Back in 2.0.28-32 days we had one server, basically the
same as all the other servers, same software versions...
the went into swap lockup every couple days. It also had
a BIOS problem: soft reboot did not work. We solved that
one with a cron job halt and a mechanical timer that power
cycled it after the halt... The things you do in an
engineering/production environment when The Show Must Go On...

That one was almost certainly a kernel leak. It just went
away with the later 2.0.3x versions.

I'd keep a top running on the machine. When the lockup
approaches you actually have loads of time to react if
you are watching.

Keep an eye out for a particular process that is continuosly
increasing in size until it is too big for swap. If you
don't see one, then the leak is in the kernel and you can
tell that from the changes in the memory and swap usage.

Worst comes to worst, a programmed reboot at 3am is
less disruptive than a locked up machine at 4pm.

------------------------------------------------------
Use Linux: A computer Dale Amon, CEO/MD
is a terrible thing Village Networking Ltd
to waste. Belfast, Northern Ireland
------------------------------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/