Re: WARNING in task_participate_group_stop

From: Oleg Nesterov
Date: Mon Nov 02 2015 - 09:17:25 EST


Hi Dmitry,

On 11/02, Dmitry Vyukov wrote:
>
> WARNING: CPU: 1 PID: 1 at kernel/signal.c:334
> task_participate_group_stop+0x157/0x1d0()
> Modules linked in:
> CPU: 1 PID: 1 Comm: init Not tainted 4.3.0 #48
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> ffffffff82e40280 ffff88003eb0fae0 ffffffff819efe55 0000000000000000
> ffff88003eb0fb20 ffffffff810ec871 ffffffff8110f4d7 ffff88003eb00000
> ffff88003eb20000 0000000000000000 ffff88003eb0fbf8 ffff88003eb20000
> Call Trace:
> [<ffffffff810eca35>] warn_slowpath_null+0x15/0x20 kernel/panic.c:480
> [<ffffffff8110f4d7>] task_participate_group_stop+0x157/0x1d0
> kernel/signal.c:334
> [<ffffffff81113587>] do_signal_stop+0x1e7/0x6e0 kernel/signal.c:2060
> [<ffffffff81116ab7>] get_signal+0x387/0x11b0 kernel/signal.c:2316
> [<ffffffff8100cf0d>] do_signal+0x8d/0x19e0 arch/x86/kernel/signal.c:707
> [<ffffffff81005d8d>] prepare_exit_to_usermode+0x11d/0x170
> arch/x86/entry/common.c:251
> [<ffffffff81005e83>] syscall_return_slowpath+0xa3/0x2b0
> arch/x86/entry/common.c:317
> [<ffffffff82d4f6a7>] int_ret_from_sys_call+0x25/0x8f
> arch/x86/entry/entry_64.S:281
> ---[ end trace f6697fd630b7c361 ]---
>
>
> The reproducer is (needs to be run as root):
>
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #include <sys/ptrace.h>
> #include <unistd.h>
>
> int main()
> {
> int pid = 1;
> ptrace(PTRACE_ATTACH, pid, 0, 0);
> ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_EXITKILL);
> sleep(1);
> return 0;
> }

Thanks.

Can't reproduce, but at first glance the problem looks clear...

> Yes, it is weird and it kills init right afterwards.

Could you confirm that this WARN_ON() happens _after_ the reproducer exits?

> But I wasn't able
> to figure out what's the root cause (why task does not have
> JOBCTL_STOP_PENDING) and maybe the same WARNING can be triggered
> without root and/or with other than init process. So still posting it
> here.

Yes I think you are right. SIGSTOP can race with SIGKILL which (unlike SIGCONT)
doesn't clear JOBCTL_STOP_DEQUEUED/PENDING/etc.

This is mostly fine, the task won't block in TASK_STOPPED if SIGKILL is pending,
but still is not right and leads to the warning above: JOBCTL_STOP_PENDING was not
set because do_signal_stop()->task_set_jobctl_pending() checks fatal_signal_pending().

Probably the patch below should fix the problem, but I'd like to think more before
I send the fix.

Oleg.

--- x/kernel/signal.c
+++ x/kernel/signal.c
@@ -2002,7 +2002,7 @@ static bool do_signal_stop(int signr)
WARN_ON_ONCE(signr & ~JOBCTL_STOP_SIGMASK);

if (!likely(current->jobctl & JOBCTL_STOP_DEQUEUED) ||
- unlikely(signal_group_exit(sig)))
+ unlikely(fatal_signal_pending(current)))
return false;
/*
* There is no group stop already in progress. We must

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/