mystery process of death

From: Ian Peters (itp@gnu.org)
Date: Tue Mar 07 2000 - 14:02:08 EST


Just earlier today, with 2.3.49 (along with tytso's one-line patch to
softirq.c), processes like 'w' and 'ps' stopped returning. Instead
they did the whole "claim to be chewing up cycles and killing the load
average while really doing nothing because the system stays
responsive" game.

I wasn't local to the box, so I just opened up a lot of ssh
connections and tried to figure out what could be up before rebooting.
I strace(1)'d both of them, and they both block with output like:

--- SNIP ---
stat("/proc/217", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/217/stat", O_RDONLY) = 7
read(7, "217 (apache) S 202 202 202 0 -1 "..., 511) = 179
close(7) = 0
open("/proc/217/statm", O_RDONLY) = 7
read(7, "90 10 7 5 0 5 3\n", 511) = 16
close(7) = 0
open("/proc/217/status", O_RDONLY) = 7
read(7, "Name:\tapache\nState:\tS (sleeping)"..., 511) = 419
close(7) = 0
open("/proc/217/cmdline", O_RDONLY) = 7
read(7, "/usr/sbin/apache\0", 2047) = 17
close(7) = 0
open("/proc/217/environ", O_RDONLY) = 7
read(7, "PWD=/\0BOOT_FILE=/boot/vmlinuz-2."..., 2047) = 370
brk(0x8192000) = 0x8192000
close(7) = 0
stat("/proc/3183", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/3183/stat", O_RDONLY) = 7
read(7, "3183 (gpm) S 1 3183 3183 0 -1 32"..., 511) = 194
close(7) = 0
open("/proc/3183/statm", O_RDONLY) = 7
read(7, "24 13 8 8 0 5 5\n", 511) = 16
close(7) = 0
open("/proc/3183/status", O_RDONLY) = 7
read(7, "Name:\tgpm\nState:\tS (sleeping)\nPi"..., 511) = 406
close(7) = 0
open("/proc/3183/cmdline", O_RDONLY) = 7
read(7, "/usr/sbin/gpm\0-m\0/dev/psaux\0-t\0p"..., 2047) = 98
close(7) = 0
open("/proc/3183/environ", O_RDONLY) = 7
read(7, "PWD=/\0COLORFGBG=default;0\0XAUTHO"..., 2047) = 1183
close(7) = 0
stat("/proc/5700", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/5700/stat", O_RDONLY) = 7
read(7,

Both 'ps' and 'w' blocked on the same process, 5700. What is process
5700, you might ask? I dunno, I couldn't figure it out either. If I
tried to look in /proc myself, my shell died too, of course.

So that's my mysterious story. I had to reboot the box because, even
though I'm running 2.3.x kernels, it is occasionally useful and having
random processes getting stuck wasn't good. If anyone has ideas of
why it happened, or how I can look harder the next time it does
happen, etc., and/or just wants to fix whatever bug it could be, I'd
love to hear it.

-- 
Ian Peters  | GnuPG Key ID 5C23D20C    | Those who know what's best
itp@gnu.org | E584 2558 FAC3 BEAB EFAC | for us must rise and save
itp@cmu.edu | FC74 CFED 7E24 5C23 D20C | us from ourselves...

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Mar 07 2000 - 21:00:23 EST