Re: WARNING in task_participate_group_stop
From: Oleg Nesterov
Date: Mon Nov 02 2015 - 09:36:52 EST
On 11/02, Dmitry Vyukov wrote:
>
> On Mon, Nov 2, 2015 at 4:13 PM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> > Hi Dmitry,
> >
> > On 11/02, Dmitry Vyukov wrote:
> >>
> >> WARNING: CPU: 1 PID: 1 at kernel/signal.c:334
> >> task_participate_group_stop+0x157/0x1d0()
> >> Modules linked in:
> >> CPU: 1 PID: 1 Comm: init Not tainted 4.3.0 #48
> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> >> ffffffff82e40280 ffff88003eb0fae0 ffffffff819efe55 0000000000000000
> >> ffff88003eb0fb20 ffffffff810ec871 ffffffff8110f4d7 ffff88003eb00000
> >> ffff88003eb20000 0000000000000000 ffff88003eb0fbf8 ffff88003eb20000
> >> Call Trace:
> >> [<ffffffff810eca35>] warn_slowpath_null+0x15/0x20 kernel/panic.c:480
> >> [<ffffffff8110f4d7>] task_participate_group_stop+0x157/0x1d0
> >> kernel/signal.c:334
> >> [<ffffffff81113587>] do_signal_stop+0x1e7/0x6e0 kernel/signal.c:2060
> >> [<ffffffff81116ab7>] get_signal+0x387/0x11b0 kernel/signal.c:2316
> >> [<ffffffff8100cf0d>] do_signal+0x8d/0x19e0 arch/x86/kernel/signal.c:707
> >> [<ffffffff81005d8d>] prepare_exit_to_usermode+0x11d/0x170
> >> arch/x86/entry/common.c:251
> >> [<ffffffff81005e83>] syscall_return_slowpath+0xa3/0x2b0
> >> arch/x86/entry/common.c:317
> >> [<ffffffff82d4f6a7>] int_ret_from_sys_call+0x25/0x8f
> >> arch/x86/entry/entry_64.S:281
> >> ---[ end trace f6697fd630b7c361 ]---
> >>
> >>
> >> The reproducer is (needs to be run as root):
> >>
> >> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> >> #include <sys/ptrace.h>
> >> #include <unistd.h>
> >>
> >> int main()
> >> {
> >> int pid = 1;
> >> ptrace(PTRACE_ATTACH, pid, 0, 0);
> >> ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_EXITKILL);
> >> sleep(1);
> >> return 0;
> >> }
> >
> > Thanks.
> >
> > Can't reproduce, but at first glance the problem looks clear...
>
> Humm... did you run as root?
Yes,
> It reproduces all the time on my 4.3 kernel VM. Also firmly killed my
> desktop running 3.13.
Yes, it kills init and crashes the kernel. But I do not see the warning.
> >> Yes, it is weird and it kills init right afterwards.
> >
> > Could you confirm that this WARN_ON() happens _after_ the reproducer exits?
> >
> >> But I wasn't able
> >> to figure out what's the root cause (why task does not have
> >> JOBCTL_STOP_PENDING) and maybe the same WARNING can be triggered
> >> without root and/or with other than init process. So still posting it
> >> here.
> >
> > Yes I think you are right. SIGSTOP can race with SIGKILL which (unlike SIGCONT)
> > doesn't clear JOBCTL_STOP_DEQUEUED/PENDING/etc.
> >
> > This is mostly fine, the task won't block in TASK_STOPPED if SIGKILL is pending,
> > but still is not right and leads to the warning above: JOBCTL_STOP_PENDING was not
> > set because do_signal_stop()->task_set_jobctl_pending() checks fatal_signal_pending().
On a second thought, in this particular case (your test-case), SIGSTOP/SIGKILL
do not race, although (so far) I think this doesn't matter. JOBCTL_STOP_PENDING
comes from __ptrace_unlink() when the tracee already has the pending SIGKILL
due to PTRACE_O_EXITKILL.
Now. If the tracee (init) wakes up and dequeues SIGKILL before __ptrace_unlink()
adds JOBCTL_STOP_PENDING, it won't see JOBCTL_STOP_PENDING and probably this is
what happens on my testing machine.
Perhaps __ptrace_unlink() should me more carefull too...
> > Probably the patch below should fix the problem, but I'd like to think more before
> > I send the fix.
>
>
> Will test it.
Great, thanks.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/