Re: 2.0.37 locked on high loaded server

Miquel van Smoorenburg (miquels@cistron.nl)
7 Jul 1999 13:17:17 +0200


In article <cistron.37830C1C.E8BFFC5D@easynet.fr>,
Hubert Tonneau <hubert.tonneau@easynet.fr> wrote:
>I run 'dmesg' and saw that Alan was perfectly right: the Buslogic
>driver is continuesly complaining about failure on the first
>SCSI disk.

So, your disk might be broken.

>What's seem still wrong to me (when the problem appends) is the
>following:
>- some processes cannot be killed using 'kill -9'
>- 'reboot' command does not work

That's because disk I/O is blocked, the SCSI disk doesn't have access
to some sectors anymore, and the system gets into a loop retrying.

>As a result, it is impossible for me to remotely recover from
>this error (the server is 100 miles away from me).
>- I had somebody press the reset button on the server (the
> SCSI disks are external and where not power cycled), and
> now everything is fine again, so I wonder if it would realy be
> impossible for Linux to recover correctly by itself.

I've had this a couple of times. Usually the binaries that are needed
to boot the system are on a part of the disk that is still intact,
probably because you only _read_ from that part of the disk. As soon
as you start accessing / writing the defective parts of the disk,
the trouble starts.

Replacing the disk usually helps. Oh and make sure you have good
ventilation in the case - heated up disks usually demise prematurely.

Mike.

-- 
Beware of Programmers who carry screwdrivers.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/