waitid() problem?

From: Sue Alfano
Date: Mon Feb 05 2007 - 07:13:50 EST


I didn't get any responses on this. I thought maybe the subject line
was misleading, so I'm trying a different one. If I'm using the wrong
mailing list, or if there's another site other than kernel.org that I
should be using to search through archives and FAQs, please let me
know.

I'm accessing the waitid function in the kernel using the _syscall4
macro since it's not supported in uClibc. waitid worked great, but
then I started having problems when more than one child needed to be
reaped.

Basically, the parent process registers a SIGCHLD signal handler via sigaction()
with the SA_RESTART option and then sits on a timed select(). All the signal
handler does is set a flag - the waitid processing takes place when the parent
drops out of the select because of the signal or a timeout. When the waitid
succeeds, the pid is used to do some processing and then is used to call
waitpid to clean up the mess. This is all in a while loop, so when the waitid
is called again, 0 is returned with si_pid = 0 indicating that there are no
more children to reap. That's all great. The problem occurs when there's 2
or more children to reap. When this occurs, the 2nd waitid call returns -1
and si_pid = 0. The waitid continues to fail in this manner until the parent
receives another SIGCHLD signal, then it works for the next zombie child in
the waiting line, but only one. I can send SIGCHLD kills to the parent from
the console and eventually get everything cleaned up. Sending the kill from
the parent isn't enough - I suppose it wants to be blocking on the select.

I tried registering the signal handler without SA_RESRART and also via signal(),
since there was a posting about SA_RESTART, but that made no difference. I
also tried not having a signal handler at all, but then I couldn't reap any
children on the timeout.

So, are there known issues with waitid when there are multiple children piled
up waiting to be reaped? If so, is there a patch or a workaround?

The product that I'm working on is using Linux version 2.6.12-2.0.0-258

Thanks for any help,
Sue
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/