I'm trying to debug some mysterious qpopper hangs, where the first few
qpoppers activated after booting my mail server hang indefinately, keeping
the load average up and making it impossible to unmount the filesystems on
shutdown/reboot.
Anyway, when I take a peek in /proc/221 (one of the hung processes), I see
the following:
'ps ax |grep 221' says:
221 ? D 0:00 in.qpopper -b /var/spool/bulletins
and 'ls -l /proc/221/fd' says:
total 0
lrwx------ 1 root root 64 Apr 7 10:25 0 -> [0000]:1379
lrwx------ 1 root root 64 Apr 7 10:25 1 -> [0000]:1379
lrwx------ 1 root root 64 Apr 7 10:25 2 -> [0000]:1379
lrwx------ 1 root root 64 Apr 7 10:25 3 -> [0802]:282639
lr-x------ 1 root root 64 Apr 7 10:25 4 -> [0004]:11362
Just for the heck of it I tried doing "more /proc/221/fd/4" - and my more
process hung just like the popper did. Same 'D' status by ps (which 'man ps'
explained as 'unterruptible sleep', while /proc/221/status called it 'disk
sleep'). So, I have the following questions:
- How do I find out what those file descriptors are pointing to?
- What could cause this?
- How do I fix it? :)
- How do I kill those processes?
I'm suspecting that the qpopper is checking a file on the user's home
directory (which is mounted via NFS from an old SunOS 4.1.3 box), and is
hanging there.
Problems like this make me wish for a 'super-kill' program, that just closes
a process's file descriptors and wipes it out by force.. no wimpy signal
stuff. Does such a tool exist? 'kill -9' doesn't help at all with
processes that get stuck like this, and having to wait for 10GB of disks to
fsck on each boot makes debugging this very, very slow and painful.
This problem appeared when we upgraded from a Pentium/75 processor to a new
Cyrix PR150+ processor.
-- Thanks in advance, Bjarni R. Einarssonbre@margmidlun.is [ THIS SPACE INTENTIONALLY LEFT BLANK ] http://www.mmedia.is/~bre Juggler@IRC http://www.mmedia.is/linux
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu