[RFC] another signal oddity

From: Al Viro
Date: Sat Sep 04 2021 - 19:28:32 EST


Suppose we are sending e.g. SIGINT to a process
(by kill(2)). The target has two threads -
* thread1 (leader) that has SIGINT blocked
* thread2 that does *not* have SIGINT blocked
* thread2 is ptraced and running (not in ptrace stop).
* handler for SIGINT is SIG_DFL.

complete_signal() is called. want_signal(SIGINT, thread1)
is false. type is not PIDTYPE_PID and thread_group_empty()
is false. want_signal(SIGINT, thread2) is true, so we end
up with signal->curr_target and t set to thread2.
p is thread1.

And then we hit this:
if (sig_fatal(p, sig) &&
True - the handler is SIG_DFL and unhandled SIGINT is fatal
!(signal->flags & SIGNAL_GROUP_EXIT) &&
True - we are not in group exit.
!sigismember(&t->real_blocked, sig) &&
True - nobody is in sigtimedwait(), so ->real_blocked is empty.
(sig == SIGKILL || !p->ptrace)) {
Also true - thread1 is not ptraced.

So we go ahead and initiate a group exit. Both thread1 and
thread2 get SIGKILL added to ->blocked and are woken up.

But AFAICS we have no business doing that - thread1 has SIGINT
blocked, so get_signal() in it would not pick that SIGINT.
And thread2 is traced, so picking SIGINT would've hit
ptrace_signal(), stop and let the tracer deal with it. If
the tracer decides to cancel that SIGINT, we would continue
just fine.

Which order of execution could possibly lead to fatal signal
delivery?

IDGI... Looks like that !p->ptrace used to be !t->ptrace until
426915796cca "kernel/signal.c: remove the no longer needed
SIGNAL_UNKILLABLE check in complete_signal()" back in 2017,
but I don't see anything in commit message that would explain
that part of changes. The testcase in there wouldn't care
either way...

What am I missing here?