Re: [BUG, TEST PATCH] stallout race between SIGCONT and SIGSTOP

From: Oleg Nesterov
Date: Wed Sep 24 2008 - 11:50:36 EST


On 09/24, Joe Korty wrote:
>
> On Wed, Sep 24, 2008 at 11:05:41AM -0400, Oleg Nesterov wrote:
> > Joe says:
> >> So it looks like the test is in error, not the kernel.
> >
> > and I am happy to agree.
> > I think sigaction/10-1.c should be fixed, please see the patch below.
>
> A year or two ago I sent to Intel some OpenPosixTestSuite fixes, and they
> were accepted. Send it in (to the people listed in the comments at the
> front of the .c file), hopefully they are still at Intel.

OK, thanks, will do.

> > I did the test patch to be sure:
> >
> > --- 26-rc2/kernel/signal.c~ 2008-09-20 20:37:52.000000000 +0400
> > +++ 26-rc2/kernel/signal.c 2008-09-24 18:43:34.000000000 +0400
> > @@ -808,7 +808,7 @@ static int send_signal(int sig, struct s
> > * exactly one non-rt signal, so that we can get more
> > * detailed information about the cause of the signal.
> > */
> > - if (legacy_queue(pending, sig))
> > + if (sig != SIGCHLD && legacy_queue(pending, sig))
> > return 0;
> > /*
> > * fast-pathed signals for kernel-internal things like SIGSTOP
> >
> > and now your test-case doesn't hang.
>
> Very interesting! I am not sure this is Posix conformant,

No, no, the patch is of course wrong, I did it only to check my
understanding.

> as Posix
> seems to say that posting a SIGSTOP or SIGCHLD clears out all pending
> SIGSTOPs or SIGCHLDs,

Hmm. Are you sure?

Anyway, this is not what Linux does. If a non-rt signal is pending, the
next signal with the same number is silently ignored. SIGCHLD too.

> Still it might be workable

Confused. Do you agree the kernel is not buggy?

To clarify, none of SIGCONTs/SIGSTOPs is lost. But the test-case assumes
that it must always receive SIGCHLD + CLD_STOPPED. This is not true because
SIGCHLD is not queueable, and we have another "stream" of SIGCHLDs which
carry CLD_CONTINUED.

For example, the "opposite" code

kill(SIGSTOP);
kill(SIGCONT);
wait_for_CLD_CONTINUED();

was always wrong, but

kill(SIGCONT);
kill(SIGSTOP);
wait_for_CLD_STOPPED();

happened to work before that commit. But please note that it is wrong
anyway. For example, if we have another sub-thread, we can miss
CLD_STOPPED even without the commit which changed the timing.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/