So, your disk might be broken.
>What's seem still wrong to me (when the problem appends) is the
>following:
>- some processes cannot be killed using 'kill -9'
>- 'reboot' command does not work
That's because disk I/O is blocked, the SCSI disk doesn't have access
to some sectors anymore, and the system gets into a loop retrying.
>As a result, it is impossible for me to remotely recover from
>this error (the server is 100 miles away from me).
>- I had somebody press the reset button on the server (the
> SCSI disks are external and where not power cycled), and
> now everything is fine again, so I wonder if it would realy be
> impossible for Linux to recover correctly by itself.
I've had this a couple of times. Usually the binaries that are needed
to boot the system are on a part of the disk that is still intact,
probably because you only _read_ from that part of the disk. As soon
as you start accessing / writing the defective parts of the disk,
the trouble starts.
Replacing the disk usually helps. Oh and make sure you have good
ventilation in the case - heated up disks usually demise prematurely.
Mike.
-- Beware of Programmers who carry screwdrivers.- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/