Re: [PATCH] signals: work around random wakeups in sigsuspend()

From: Oleg Nesterov
Date: Wed Jan 27 2016 - 11:42:04 EST

On 01/27, Peter Zijlstra wrote:
> On Tue, Jan 26, 2016 at 10:10:09PM +0100, Oleg Nesterov wrote:
> > And, ironically, there is another more serious "reverse" problem ;) sigsuspend()
> > orany other user of -ERESTARTNOHAND can "miss" the signal, in a sense that the
> > kernel can wrongly restart this syscall after return from signal handler. This
> > is not trivial to fix..
> So I'm not entirely sure I get what you mean there.

Yes, sorry. When I re-read my email it really looks like "I know the secret but I
won't tell you" ;)

The problem is simple, the fix is not. -ERESTARTNOHAND means that we should restart
if do_signal()->get_signal() returns zero. This can happen when, say, another thread
has already dequeued the group-wide signal which was the reason for TIF_SIGPENDING.
Or signal_pending() was true because of debuger, or another reason, doesn't matter.

In this case do_signal() does

regs->ax = regs->orig_ax;
regs->ip -= 2;

and we return to user space. Note that after that the kernel can't know that the
task is going to restart the syscall which should be interrupted by signals.

Now suppose that another signal (with the handler) comes before this task executes
the "syscall" insn. In this case sigsuspend() will restart after the task runs the

> But it did get me to
> look at the patch again:
> + while (!signal_pending(current)) {
> + __set_current_state(TASK_INTERRUPTIBLE);
> + schedule();
> + }
> That should very much be:
> for (;;) {
> set_current_state(TASK_INTERRUPTIBLE);
> if (signal_pending(current))
> break;
> schedule();
> }
> __set_current_state(TASK_RUNNING);

Why? It should work either way. Yes, signal_wakeup() can come right before
__set_current_state(TASK_INTERRUPTIBLE) but this is fine, __schedule() must not
sleep if signal_pending() == T, that is why it checks signal_pending_state().
See also the comment above smp_mb__before_spinlock() in schedule().

IOW, signal_pending() is the "special" condition, you do not need to serialize
this check with task->state setting, exactly because schedule() knows about the