Re: Oddness with 2.0.29...

Oskar Pearson (oskar@is.co.za)
Fri, 23 May 1997 16:10:30 +0200


"Edward S. Marshall" writes:
> Just had a rather disturbing experience with 2.0.29...
>
> One of our machines here had been running smoothly for about 3-4 weeks,
> when suddenly one of our user's processes hung, and couldn't be killed.
> Period. The process simply wouldn't die.
This is normally caused by "Disk wait"

The process is probably waiting for a disk response before it can die.

This normally happens when your disk is dying.

> The problem came in when we tried to shutdown the system to resolve the
> problem. The process was holding a few files open on one of the
> partitions, and as such, it wasn't possible to unmount the partition
> cleanly, which meant a bit of data loss at reboot.
I have seen this before too.

In fact with an older version of wu-ftpd, if you sent it a 'kill' with
the -15 signal they would go into a state where a kill -9 wouldn't kill
them either.

Thus, when shutting our ftp server down, we would always have bad
disks.

upgrading wu-ftpd fixed this.

> Two questions:
>
> b) Any suggestions for handling the problem? Does 2.0.30 have any
> additions or fixes which might affect the situation we had?

It's normally caused by the disk... make sure that the machine is syslogging
kernel errors, preferably to both console and syslog and if it happens
again you will have it (hopefully) logged.

Oskar