Re: BUG: NTPL: waitpid() doesn't return?

From: Matthias Urlichs
Date: Sat Jan 31 2004 - 15:20:45 EST


Hi,

Linus Torvalds:
>
> "Clearly" is not correct. What I bet happens is:
>
> > 31346 execve("/usr/bin/apt-ftparchive", ["apt-ftparchive", "packages", "testing/all"], [/* 12 vars */] <unfinished ...>
> > 31346 <... execve resumed> ) = 0
> > 31346 exit_group(0) = ?
> > 31340 --- SIGCHLD (Child exited) @ 0 (0) ---
> > 31342 waitpid(31346, <unfinished ...>
>
> Notice how you do "waitpid()" for a _specific_ thread when you get a
> SIGCHLD.
>
> How the heck do you know _which_ thread it was that exited?

Please look again. The above trace clearly shows that #31346 has
exited, which is exactly the thread being waitpid()ed for.

What the program does is basically
- spawn four threads or so
- each thread forks off some process, and then waitpid()s for exactly
that pid

... and all but the last waitpid() never returns even though all four
child processes have exit_group()ed.

> Rule: never EVER wait for a specific thread in a SIGCHLD handler. When you
> get a SIGCHLD, your signal handler should just wait for "any thread".

No SIGCHLD has been installed. (I checked the strace output.)

--
Matthias Urlichs | noris network AG | http://smurf.noris.de/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/