Communicator crash; kill -9 won't remove (pre-10!)

Rob Hagopian (hagopiar@vuser.vu.union.edu)
Tue, 7 Oct 1997 15:51:51 -0400 (EDT)


OK, I had another communicator crash that kill -9 won't remove. No locked
buffers, and strace acts funny.

Here's all the debug output that I know how to generate. I'd be happy to
provide anything else you need!:

-----

ps -auxww | grep communicator | grep -v grep:
stepanep 1629 0.7 8.0 14660 10140 ? D 14:09 0:42 communicator
stepanep 1630 0.0 0.0 0 0 ? Z 14:09 0:00 (communicator
<zombie>)

-----

strace -p 1630:
+++ killed by SIGKILL +++
strace -p 1630 (again):
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted

strace -p 1629 hangs; won't be killed by HUP or Ctrl-C, only KILL...

-----

Ctrl-ScrlLck:
communicator 169 D 00000000 4 1629 1 1630 132
communicator 170 Z 00000000 4 1630 1629

Shift-ScrlLck:
Mem-info:
Free pages: 8568kB
( 350*4kB 322*8kB 131*16kB 16*32kB 1*64kB 15*128kB = 8568kB)
Swap cache: add 4496/4496, delete 57599876/4492, find 1120/0
Free swap: 130476kB
32768 pages of RAM
2155 free pages
1139 reserved pages
22585 pages shared
Buffer memory: 4116kB
Buffer heads: 1054
Buffer blocks: 1029
CLEAN: 706 buffers, 3 used (last=3), 0 locked, 0 protected, 0 dirty
LOCKED: 247 buffers, 34 used (last=231), 0 locked, 0 protected, 0 dirty
LOCKED1: 2 buffers, 0 used (last=0), 0 locked, 0 protected, 0 dirty
DIRTY: 26 buffers, 4 used (last=24), 0 locked, 0 protected, 26 dirty

Right-Alt-ScrlLck:
EIP: 0010:[<001096e2>] EFLAGS: 00000246
EAX: 00000000 EBX: 00000100 ECX: 001093f4 EDX: 00000022
ESI: 001c3ac0 EDI: 00000000 EBP: 00009000 DS: 0018 ES: 0018 FS: 0018 GS:
0018

-----

top:
3:47pm up 1 day, 15:29, 27 users, load average: 1.47, 1.31, 1.34
186 processes: 182 sleeping, 1 running, 2 zombie, 1 stopped
CPU states: 1.1% user, 5.2% system, 5.6% nice, 94.2% idle
Mem: 126516K av, 120760K used, 5756K free, 94208K shrd, 2796K buff
Swap: 130748K av, 272K used, 130476K free 59184K cached

PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
15226 root 11 0 580 580 400 R 0 3.2 0.4 0:01 top
336 root 6 0 796 780 364 S 0 1.7 0.6 33:57 rpc.nfsd
6 root 1 0 0 0 0 SW 0 0.3 0.0 7:34 md_thread
14704 vanderlm 1 0 500 500 400 S 0 0.3 0.3 0:00 telnet
5508 root 0 0 552 552 424 S 0 0.1 0.4 0:01 in.telnetd
5520 srinivas 1 0 800 800 596 S 0 0.1 0.6 0:02 irc
14746 cogevint 0 0 576 576 428 S 0 0.1 0.4 0:00 talk
1 root 0 0 272 272 204 S 0 0.0 0.2 0:21 init

-----

You notice how I'm spinning at 1.4x loadaverage despite the lack of
anything taking up that much cpu? The .4 is due to nfs stuff and normal
cpu stuff, but the 1x is the communicator process, despite the lack of it
showing up here or in the %CPU field of ps -aux (yes, I'm sure; we've had
3 of these once and it scales up perfectly to 3.xx, and I've watched one
crash and saw the loadavg rise right up to 1.xx)

Please! This shouldn't be happening! This is linux-2.0.31-pre-10 + an
alpha raid patch, but the raid patch doesn't touch anything relating to
process management.
-Rob H.