Signal problems

Steven Suson (suson@tti.TuckerEnergy.com)
Tue, 27 Oct 1998 10:46:01 -0600

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: jan@saw.rwth-aachen.de: "problems with AM53/79C974"
Previous message: Marc Espie: "2.1.125: apm blank display regression"
Next in thread: Manuel J. Galan: "Re: pre-2.1.127-2 problem with parport_share.c"
Reply: Manuel J. Galan: "Re: pre-2.1.127-2 problem with parport_share.c"

Greetings all,

While fighting some problems with signal handling and threads (and
yes I'm aware that there are known problems, but we had no choice but to
get our apps. working), we found a couple of problems in
kernel/signal.c. One of which, I've provided a patch for below. In this
one, the nr_queued_signals is incremented, regardless of whether or not
the kmem_cache_alloc succeeds or not. The other problem, which I believe
is a bug, occurs with the following from the ignored_signal routine:

if ((t->flags & PF_PTRACED) || sigismember(&t->blocked, sig))
return 0;

Note that this means that even if the signal is ignored by the
process (i.e. set to SIG_IGN), if it is blocked, then the process will
get the signal. While it may not make sense for a userland application
to set up its signal handling after this fashion, I do believe that it
is incorrect for the kernel to deliver a signal, even if ignored.
However, I believe that SIGCLD is a special case.
The correct solution to this problem was not obvious to us, but
perhaps may be to someone else. The hack (since we only had the problem
with RT signals) which worked for us (again NOT the correct solution)
was to modify the line comme ca:

if ( (sig < SIGRTMIN) && ((t->flags & PF_PTRACED) ||
sigismember(&t->blocked, sig)))

The other problems we had were due to some local methodologies.
However, I believe they raise some real concerns for signal handling,
vis-`a-vis threads, in general. One real problem stems from the POSIX
specification that in a multi-threaded process, all threads will share
the same sigaction structure (CLONE_SIGHAND), and each thread shall have
its own signal mask. So if the "process" receives a RT queued signal,
which is delivered to all threads, it remains queued until thread exit
for all threads who have it blocked. This breaks one of the standard
thread models for signal handling, which is to have one signal handling
thread (on a sigwait),
and all other threads are not directly concerned with the signal. In
this case, we will likely overrun the maximum number of queued signals.
This could probably be addressed if the kernel had an object or
associatively that I will call a "process container." This would allow
the kernel to recognize that a task is a member of POSIX-like "process,"
and treat it accordingly. With this, it would be possible to select a
thread to receive the signal, per the POSIX specs (for those signals
which only go to one thread, i.e. asynchronous signals). Thusly, the RT
signal would go to the thread which in fact handles it, therefore
eliminating the queuing problem.

Another problem which results from the lack of such a "process
container" is the inability to have one thread fork/exec a task, and
then have the "signal handling thread" reap the death of the child.

I am sure that there are many possible solutions to these issues.
Some of which include modifying the task structure (yea, yea I
know....), or perhaps ensuring (through some flag?) that all
asynchronous signals are sent to only the "manager thread" (a Linux
threads term), and allow it to decide who receives the signal somehow?

It is our sincere belief that these issues must be addressed. In
order to become fully POSIX compliant, the kernel, and userland Linux
Threads must be brought into line with the POSIX model. I am anxious to
hear others' comments, and suggestions for possible solutions. We would
be more than happy to participate is such a project.

Thanks for your time,
Steve Suson
"Keep the faith."

Anyway, here's the aforementioned patch.

--- signal.c.orig Sun Sep 13 19:11:35 1998
+++ signal.c Mon Oct 26 17:18:14 1998
@@ -333,10 +333,10 @@
if (nr_queued_signals < max_queued_signals) {
q = (struct signal_queue *)
kmem_cache_alloc(signal_queue_cachep,
GFP_KERNEL);
- nr_queued_signals++;
}

if (q) {
+ nr_queued_signals++;
q->next = NULL;
*t->sigqueue_tail = q;
t->sigqueue_tail = &q->next;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: jan@saw.rwth-aachen.de: "problems with AM53/79C974"
Previous message: Marc Espie: "2.1.125: apm blank display regression"
Next in thread: Manuel J. Galan: "Re: pre-2.1.127-2 problem with parport_share.c"
Reply: Manuel J. Galan: "Re: pre-2.1.127-2 problem with parport_share.c"