Re: unexpected kernel reboot (3)
From: Dmitry Vyukov
Date: Wed Mar 11 2020 - 16:18:12 EST
> On Monday, July 16, 2018 at 12:10:07 PM UTC+2, Dmitry Vyukov wrote:
>>
>> On Fri, Jul 13, 2018 at 11:58 PM, Andrew Morton
>> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>> > On Fri, 13 Jul 2018 14:39:02 -0700 syzbot <syzbot+cce9ef2dd25246f815ee@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>> >
>> >> Hello,
>> >>
>> >> syzbot found the following crash on:
>> >
>> > hm, I don't think I've seen an "unexpected reboot" report before.
>> >
>> > Can you expand on specifically what happened here? Did the machine
>> > simply magically reboot itself? Or did an external monitor whack it,
>> > or...
>>
>> We put some user-space workload (not involving reboot syscall), and
>> the machine suddenly rebooted. We don't know what triggered the
>> reboot, we only see the consequences. We've seen few such bugs before,
>> e.g.:
>> https://syzkaller.appspot.com/bug?id=4f1db8b5e7dfcca55e20931aec0ee707c5cafc99
>> Usually it involves KVM. Potentially it's a bug in the outer
>> kernel/VMM, it may or may not be present in tip kernel.
>>
>>
>> > Does this test distinguish from a kernel which simply locks up?
>>
>> Yes. If you look at the log:
>>
>> https://syzkaller.appspot.com/x/log.txt?x=17c6a6d0400000
>>
>> We've booted the machine, started running a program, and them boom! it
>> reboots without any other diagnostics. It's not a hang.
>>
>>
>>
>> >> HEAD commit: 1e4b044d2251 Linux 4.18-rc4
>> >> git tree: upstream
>> >> console output: https://syzkaller.appspot.com/x/log.txt?x=17c6a6d0400000
>> >> kernel config: https://syzkaller.appspot.com/x/.config?x=25856fac4e580aa7
>> >> dashboard link: https://syzkaller.appspot.com/bug?extid=cce9ef2dd25246f815ee
This happened 10K+ times.
If GCE VM is rebooted by doing something with KVM subsystem, I assume
it's a GCE bug (?). +Jim
>> >> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>> >> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=165012c2400000
>> >> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1571462c400000
>> >
>> > I assume the "C reproducer" is irrelevant here.
>> >
>> > Is it reproducible?
>>
>> Yes, it is reproducible and the C reproducer is relevant.
>> If syzbot provides a reproducer, it means that it booted a clean
>> machine, run the provided program (nothing else besides typical init
>> code and ssh/scp invocation) and that's the kernel output it observed
>> running this exact program.
>> However in this case, the exact setup can be relevant. syzbot uses GCE
>> VMs, it may or may not reproduce with other VMMs/physical hardware,
>> sometimes such bugs depend on exact CPU type.
>>
>>
>> >> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> >> Reported-by: syzbot+cce9ef2dd25246f815ee@xxxxxxxxxxxxxxxxxxxxxxxxx
>> >>
>> >> output_len: 0x00000000092459b0
>> >> kernel_total_size: 0x000000000a505000
>> >> trampoline_32bit: 0x000000000009d000
>> >>
>> >> Decompressing Linux... Parsing ELF... done.
>> >> Booting the kernel.
>> >> [ 0.000000] Linux version 4.18.0-rc4+ (syzkaller@ci) (gcc version 8.0.1
>> >> 20180413 (experimental) (GCC)) #138 SMP Mon Jul 9 10:45:11 UTC 2018
>> >> [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz root=/dev/sda1
>> >> console=ttyS0 earlyprintk=serial vsyscall=native rodata=n
>> >> ftrace_dump_on_oops=orig_cpu oops=panic panic_on_warn=1 nmi_watchdog=panic
>> >> panic=86400 workqueue.watchdog_thresh=140 kvm-intel.nested=1
>> >>
>> >> ...
>> >>
>> >> regulatory database
>> >> [ 4.519364] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
>> >> [ 4.520839] platform regulatory.0: Direct firmware load for
>> >> regulatory.db failed with error -2
>> >> [ 4.522155] cfg80211: failed to load regulatory.db
>> >> [ 4.522185] ALSA device list:
>> >> [ 4.523499] #0: Dummy 1
>> >> [ 4.523951] #1: Loopback 1
>> >> [ 4.524389] #2: Virtual MIDI Card 1
>> >> [ 4.825991] input: ImExPS/2 Generic Explorer Mouse as
>> >> /devices/platform/i8042/serio1/input/input4
>> >> [ 4.829533] md: Waiting for all devices to be available before autodetect
>> >> [ 4.830562] md: If you don't use raid, use raid=noautodetect
>> >> [ 4.835237] md: Autodetecting RAID arrays.
>> >> [ 4.835882] md: autorun ...
>> >> [ 4.836364] md: ... autorun DONE.
>> >
>> > Can we assume that the failure occurred in or immediately after the MD code,
>> > or might some output have been truncated?
>> >
>> > It would be useful to know what the kernel was initializing immediately
>> > after MD. Do you have a kernel log for the same config when the kerenl
>> > didn't fail? Or maybe enable initcall_debug?
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@xxxxxxxxxxxxxxxxx
>> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180713145811.683ffd0043cac26a5a5af725%40linux-foundation.org.
>> > For more options, visit https://groups.google.com/d/optout.