Frozen machine, no obvious reason

Mihai Ibanescu (misa@dntis.ro)
Thu, 27 Nov 1997 18:48:52 +0200 (EET)


Hello

Sorry for this incolmplete message, it is not a bug report, I am
just trying to understand what is going on.
So, I have a ftp server and a (quite loaded) nfs server (lots of
exports). 2.0.27 used to run quite well on this machine. Since the f00f
bug and the teardrop, I had to change it to a more actual version. So,
after 7 days of 2.0.32, I have noticed a very strange behaviour. No
process was running with more than 1% CPU, nothing special was running,
however the load was:

5:35pm up 7 days, 23:34, 7 users, load average: 8.29, 6.62, 5.29

free was reporting only 400k of memory out of 64 megs, and some
megs in swap. Sorry, I do not have a clip of free from that time.

Well, for a couple of hours everything was crawling, and I was
suspecting something wrong in NFS. So I have turned it down (client and
server). Nothing new. The load was still around 5. But, after a while, I
got:

5:43pm up 7 days, 23:43, 7 users, load average: 21.18, 12.64, 8.15

so there was no surprise that nothing was working anymore, no getty, no
init.
I know it's a poor report, I don't have anything more. Do you have
any ideas about what was happening?
I can add more:

1. I do have a vortex in that machine (and after some digging in
linux-kernel, I realized it might be it). Any of you who noticed lockups
with vortex, is this the way vortex locks?

2. I don't have untrusted users on this machine, and as I said, there was
no compiling or something like that. I am pretty sure the CPU had an 80%
average free.

3. The kernel is not compiled with a great deal of things in it. Just
standard IP, plus NFS. No modules.

So, how can I tell if there is a memory leak? What is the way to
find out how much memory the kernel takes?

I would appreciate any suggestion.

Thanks for the effort,

Mihai

Mihai Ibanescu Dynamic Network Technologies
http://sysadm.dntis.ro/~misa Moara de Foc 35, et. 7, 6600 Iasi
misa@dntis.ro tel. +40-32-252936