2.0.37 locked on high loaded server

Hubert Tonneau (hubert.tonneau@easynet.fr)
Wed, 07 Jul 1999 10:13:16 +0200


A few weeks ago I sent a message specifying that my main
server running 2.0.36 locked.
Alan replyed that I should run 'dmesg' in order to provide
helpfull infomations and he guessed that it would report
tons of disk errors.

It appent once again (while running 2.0.37).
I run 'dmesg' and saw that Alan was perfectly right: the Buslogic
driver is continuesly complaining about failure on the first
SCSI disk.

What's seem still wrong to me (when the problem appends) is the
following:
- some processes cannot be killed using 'kill -9'
- 'reboot' command does not work
As a result, it is impossible for me to remotely recover from
this error (the server is 100 miles away from me).
- I had somebody press the reset button on the server (the
SCSI disks are external and where not power cycled), and
now everything is fine again, so I wonder if it would realy be
impossible for Linux to recover correctly by itself.

PS: I have no more the exact content of 'dmesg' because I wrote
it down to file on the root disk (IDE), but the file had vanished
when the machine went up again after the reboot: it seems that
problems on one disk controler also make other ones work sadely,
maybe due to some locks in the cache flusher process.
Next time, I will FTP it before rebooting the server :-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/