Re: Possible bug in wait4(), 2.1.126-129 ?

Nick Holloway (Nick.Holloway@alfie.demon.co.uk)
Tue, 24 Nov 1998 08:35:51 GMT


> The real point is that what we see should never ever happen, regardless of
> libc, cron or pretty much anything the userspace can do. Only the current
> direct parent of a zombie can "wait" for it. Yet, in my test case the
> zombie effectively disappears without the parent waiting for it and before
> the parent itself finished. I checked that this is indeed the case by
> inserting a longer sleep() in the parent, and then checking the process
> table -- the child is gone for good! Needless to say, this completely
> breaks UNIX process semantics...

Ping! Penny drops...

There is one condition that this can happen. If a process has set
the SIGCHLD handler to SIG_IGN, then the kernel will perform the wait
on the parent processes behalf (see arch/i386/kernel/signal.c:654).
This is a quick way to avoid zombies without the pain of calling wait.

In addition, signal masks are inherited across exec, so the cron process
is run with the signal mask from cron.

So, it is a bug in cron that it isn't reverting the default handler
for SIGCHLD before forking the cron job. It may be that the different
behaviour seen depends on the shell used -- it is possible that bash as
/bin/sh resets SIGCHLD.

Anyway, you can see the same behaviour from the command line if you
insert a "signal(SIGCHLD,SIG_IGN);" at the start of the C program.
Similarly, inserting "signal(SIGCHLD,SIG_DFL);" should fix it.

Anyway, it doesn't look like a kernel problem to me.

-- 
 `O O'  | Home: Nick.Holloway@alfie.demon.co.uk  http://www.alfie.demon.co.uk/
// ^ \\ | Work: Nick.Holloway@parallax.co.uk

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/