Re: Unkillable Zombie process under 2.6.3 and 2.6.4

From: David Fort
Date: Fri Mar 12 2004 - 11:43:41 EST


David Fort wrote:
[...]

Does it help to send a SIGCONT to all processes in T state?



No it doesn't, gdb loose its context when doing this(this triggers an internal gdb error):

lin-lwp.c:653: internal-error: stop_wait_callback: Assertion `pid == GET_LWP (lp->ptid)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

I was finally able to reproduce the unkillable zombie process. How to reproduce it:
- launch gdb with the program in the attached tarball.
- once the program is running you have 5 seconds to telnet on localhost on port 7899, and
sit on your keyboard.
-after 5 seconds the first popen occurs in one of the thread of kbug which causes gdb or the kernel to bug(waiting for new
child: No child processes.)

[dfo@chiffre kbug]$ ~/cvs/gdb/gdb/gdb ./kbug
GNU gdb 2004-03-12-cvs
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) r 5 6
Starting program: /home/dfo/test/kbug/kbug 5 6
[Thread debugging using libthread_db enabled]
[New Thread 1073961248 (LWP 1882)]
[New Thread 1083698096 (LWP 1885)]
[New Thread 1092090800 (LWP 1886)]
1882(1083698096)do_task1: starting
1882(1073961248)main: starting
1882(1092090800)do_task1: starting
[New Thread 1100487600 (LWP 1888)]
1882(1100487600)remoteworker_thread_run: starting
remoteworker_thread_run(1100487600): readed 2 bytes [
]
[.. that's when i'm sending things via the telnet......]
]
remoteworker_thread_run(1100487600): readed 2 bytes [
]
waiting for new child: No child processes.
(gdb)



When here ask to quit, and gdb may hang:
(gdb) q
The program is running. Exit anyway? (y or n) y

Once this is done just kill gdb(killall gdb), here i get the following:
ps xwaf | grep kbug
5255 pts18 S 0:00 \_ grep kbug
1882 pts16 Z 0:00 [kbug] <defunct>

the kbug process became an unkillable zombie.
The provided trace is with the CVS of gdb but i have same behaviour with the
legacy gdb-6.0-2mdk

kbug:
kbug has a POPENFUNC which is a just call to popen(/bin/true)
threads of kbug:
-main thread creates things(other threads) and then call POPENFUNC
-task1 thread binds 7899, accepts the first connection and launch a thead "remoteworker" to treat the incoming connection, once this
is done it calls POPENFUNC
-remoteworker "select" the incoming connection and prints what is send
-task2 does only POPENFUNC
I can provide any information that is needed.

--
Fort David, Projet IDsA
IRISA-INRIA, Campus de Beaulieu, 35042 Rennes cedex, France
Tél: +33 (0) 2 99 84 71 33


Attachment: kbug2.tar.bz2
Description: BZip2 compressed data