Re: WARNING in set_restore_sigmask

From: Dmitry Vyukov
Date: Fri Jan 29 2016 - 09:05:41 EST


On Fri, Jan 29, 2016 at 2:57 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Fri, 29 Jan 2016, Dmitry Vyukov wrote:
>> On Fri, Jan 29, 2016 at 12:53 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> > Dmitry,
>> >
>> > On Fri, 29 Jan 2016, Dmitry Vyukov wrote:
>> >> WARNING: CPU: 2 PID: 10905 at ./arch/x86/include/asm/thread_info.h:236
>> >> sigsuspend+0x18e/0x1f0()
>> >> Modules linked in:
>> >> CPU: 2 PID: 10905 Comm: syz-executor Not tainted 4.5.0-rc1+ #300
>> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> >> 00000000ffffffff ffff88006139fe38 ffffffff82be118d 0000000000000000
>> >> ffff88006d054740 ffffffff867387e0 ffff88006139fe78 ffffffff813536d9
>> >> ffffffff813839ce ffffffff867387e0 00000000000000ec 0000000020000000
>> >> Call Trace:
>> >> [< inline >] __dump_stack lib/dump_stack.c:15
>> >> [<ffffffff82be118d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>> >> [<ffffffff813536d9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
>> >> [<ffffffff81353909>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
>> >> [< inline >] set_restore_sigmask
>> >> ./arch/x86/include/asm/thread_info.h:236
>> >> [<ffffffff813839ce>] sigsuspend+0x18e/0x1f0 kernel/signal.c:3513
>> >> [< inline >] SYSC_rt_sigsuspend kernel/signal.c:3533
>> >> [<ffffffff81387d7c>] SyS_rt_sigsuspend+0xac/0xe0 kernel/signal.c:3523
>> >> [<ffffffff86653236>] entry_SYSCALL_64_fastpath+0x16/0x7a
>> >> arch/x86/entry/entry_64.S:185
>> >> ---[ end trace da5c27e3b7defd96 ]---
>> >
>> > That could be just a spurious wakeup of unknown provenience. The sigsuspend
>> > code has no protection against those. I can't see why that happens ...
>>
>> You mean that you _see_ why this warning happens?
>
> It happens when a spurious wakeup occurs, but I don't see how that happens in
> your fuzzing apps.
>
>> >
>> >> Unfortunately I cannot reproduce it. But the only two programs that
>> >
>> > I would be helpful if you could run your fuzzers with a minimal set of trace
>> > points enabled (raw_syscalls, sched events) and set
>> > /proc/sys/kernel/traceoff_on_warning to 1, so the trace freezes when a warning
>> > is triggered.
>> >
>> > That might give us at least some insight into these one off issues.
>>
>> Can you please give more concrete instructions? I never used
>> tracepoints. What configs do I need to enable? What runtime setup? etc
>
> CONFIG_FTRACE=y
>
> Then after booting:
>
> mount debugfs if not mounted already
>
> # mount -t debugfs debugfs /sys/kernel/debug
>
> # echo 1 > /sys/kernel/debug/tracing/events/raw_syscalls/enable
> # echo 1 > /sys/kernel/debug/tracing/events/sched/enable
> # echo 1 > /proc/sys/kernel/traceoff_on_warning
>
> That freezes the trace when a warning/bug is hit. You can then retrieve the
> trace via:
>
> # cat /sys/kernel/debug/tracing/trace
>
> You can also do
>
> # echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
>
> which will spill out the trace buffer over serial console. That's useful if
> your kernel crashes completely. But be aware that it might take quite some
> time ....

I need something that will work without supervision. I need to use
/proc/sys/kernel/ftrace_dump_on_oops instead of
/proc/sys/kernel/traceoff_on_warning then, right?

Quite some time? Does it dump trace from boot? In my setup kernel can
work up to an hour under super heavy parallel workload... Need to
check how it will cope with it.