Re: WARNING in rcu_check_gp_start_stall

From: Dmitry Vyukov
Date: Sat Feb 23 2019 - 05:33:49 EST


On Fri, Feb 22, 2019 at 11:20 PM Borislav Petkov <bp@xxxxxxxxx> wrote:
>
> On Fri, Feb 22, 2019 at 09:10:04AM -0800, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit: 8a61716ff2ab Merge tag 'ceph-for-5.0-rc8' of git://github...
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1531dd3f400000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=7132344728e7ec3f
> > dashboard link: https://syzkaller.appspot.com/bug?extid=111bc509cd9740d7e4aa
> > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16d4966cc00000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10c492d0c00000
>
> So I ran this for more than an hour in a guest here with the above
> .config but nothing happened. The compiler I used is 8.2, dunno of that
> makes the difference or I'm missing something else...

I was able to reproduce this on the first run:

# ./syz-execprog -procs=8 -repeat=0 hang
2019/02/23 10:24:31 parsed 1 programs
2019/02/23 10:24:31 executed programs: 0
2019/02/23 10:24:36 executed programs: 23
2019/02/23 10:24:41 executed programs: 71
2019/02/23 10:24:46 executed programs: 118
2019/02/23 10:24:52 executed programs: 162
2019/02/23 10:24:57 executed programs: 208
2019/02/23 10:25:02 executed programs: 258
2019/02/23 10:25:07 executed programs: 288

And on the console:

[ 77.032078] sched: RT throttling activated
[ 183.901866] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 183.902595] rcu: (detected by 2, t=10502 jiffies, g=5945, q=335)
[ 183.903249] rcu: All QSes seen, last rcu_preempt kthread activity
10500 (4294955649-4294945149), jiffies_till_next_fqs=1, root ->qsmask
0x0
[ 183.904548] syz-executor R running task 56728 7574 6076 0x00000000
[ 183.905300] Call Trace:
[ 183.905570] <IRQ>
[ 183.905807] sched_show_task.cold+0x273/0x2d5
[ 183.906283] ? can_nice.part.0+0x20/0x20
[ 183.906708] ? kmsg_dump_rewind_nolock+0xe4/0xe4
[ 183.907205] ? print_usage_bug+0xd0/0xd0
[ 183.907629] ? __sanitizer_cov_trace_cmp8+0x18/0x20
[ 183.908149] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 183.908729] print_other_cpu_stall.cold+0x7f2/0x8bb
[ 183.909260] ? print_cpu_stall+0x170/0x170
[ 183.909703] ? add_lock_to_list.isra.0+0x450/0x450
[ 183.910219] ? find_held_lock+0x35/0x120
[ 183.910643] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 183.911217] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 183.911791] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 183.912367] ? check_preemption_disabled+0x48/0x290
[ 183.912889] ? __this_cpu_preempt_check+0x1d/0x30
[ 183.913389] ? rcu_preempt_need_deferred_qs+0x71/0x1a0
[ 183.913939] ? do_trace_rcu_torture_read+0x10/0x10
[ 183.914496] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 183.915119] ? check_preemption_disabled+0x48/0x290
[ 183.915681] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 183.916307] ? check_preemption_disabled+0x48/0x290
[ 183.916876] rcu_check_callbacks+0xf36/0x1380
[ 183.917381] ? account_system_index_time+0x31a/0x5f0
[ 183.917964] ? rcutree_dead_cpu+0x10/0x10
[ 183.918437] ? trace_hardirqs_off+0xb8/0x310
[ 183.918934] ? __lock_is_held+0xb6/0x140
[ 183.919392] ? trace_hardirqs_on_caller+0x310/0x310
[ 183.919966] ? check_preemption_disabled+0x48/0x290
[ 183.920536] ? raise_softirq+0x189/0x430
[ 183.920997] ? account_system_index_time+0x33f/0x5f0
[ 183.921575] ? raise_softirq_irqoff+0x2d0/0x2d0
[ 183.922110] ? check_preemption_disabled+0x48/0x290
[ 183.922679] ? __sanitizer_cov_trace_const_cmp1+0x1a/0x20
[ 183.923303] ? hrtimer_run_queues+0x99/0x410
[ 183.923806] ? run_local_timers+0x194/0x230
[ 183.924301] ? timer_clear_idle+0x90/0x90
[ 183.924777] ? account_process_tick+0x27f/0x350
[ 183.925314] ? ktime_mono_to_any+0x3a0/0x3a0
[ 183.925819] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 183.926461] update_process_times+0x32/0x80
[ 183.926960] tick_sched_handle+0xa2/0x190
[ 183.927438] tick_sched_timer+0x47/0x130
[ 183.927905] __hrtimer_run_queues+0x3a7/0x1050
[ 183.928433] ? tick_sched_do_timer+0x1b0/0x1b0
[ 183.928959] ? hrtimer_fixup_init+0x90/0x90
[ 183.929450] ? kvm_clock_read+0x18/0x30
[ 183.929904] ? __sanitizer_cov_trace_cmp4+0x16/0x20
[ 183.930473] ? ktime_get_update_offsets_now+0x3d5/0x5e0
[ 183.931081] ? do_timer+0x50/0x50
[ 183.931474] ? add_lock_to_list.isra.0+0x450/0x450
[ 183.932032] ? rcu_softirq_qs+0x20/0x20
[ 183.932482] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 183.933112] hrtimer_interrupt+0x314/0x770
[ 183.933600] smp_apic_timer_interrupt+0x18d/0x760
[ 183.934160] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 183.934716] ? smp_call_function_single_interrupt+0x640/0x640
[ 183.935386] ? trace_hardirqs_off+0x310/0x310
[ 183.935897] ? task_prio+0x50/0x50
[ 183.936304] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 183.936932] ? check_preemption_disabled+0x48/0x290
[ 183.937504] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 183.938067] apic_timer_interrupt+0xf/0x20
[ 183.938549] </IRQ>
[ 183.938808] RIP: 0010:lock_is_held_type+0x17e/0x210
[ 183.939379] Code: 00 00 00 fc ff df 41 c7 85 7c 08 00 00 00 00 00
00 48 c1 e8 03 80 3c 10 00 75 63 48 83 3d f9 65 2e 08 00 74 30 48 89
df 57 9d <0f> 1f 44 00 00 48 83 c4 08 44 89 e0 5b 41 5c 41 5d 5d c3 48
83 c4
[ 183.941512] RSP: 0018:ffff88804c04f2c8 EFLAGS: 00000286 ORIG_RAX:
ffffffffffffff13
[ 183.942388] RAX: 1ffffffff132607e RBX: 0000000000000286 RCX: dffffc0000000000
[ 183.943208] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000286
[ 183.944032] RBP: ffff88804c04f2e8 R08: ffff88805c074000 R09: ffffed100d8a7f98
[ 183.944855] R10: ffffed100d8a7f97 R11: ffff88806c53fcbb R12: 0000000000000000
[ 183.945674] R13: ffff88805c074000 R14: dffffc0000000000 R15: 0000000000000008
[ 183.946508] ___might_sleep+0xd5/0x160
[ 183.946956] ? ext4_write_end+0x1090/0x1090
[ 183.947452] generic_perform_write+0x3fd/0x6a0
[ 183.947978] ? add_page_wait_queue+0x480/0x480
[ 183.948504] ? current_time+0x1b0/0x1b0
[ 183.948958] ? generic_write_check_limits+0x380/0x380
[ 183.949547] ? ext4_file_write_iter+0x28b/0x1440
[ 183.950091] __generic_file_write_iter+0x25e/0x630
[ 183.950659] ext4_file_write_iter+0x37a/0x1440
[ 183.951184] ? ext4_file_mmap+0x410/0x410
[ 183.951654] ? save_stack+0xa9/0xd0
[ 183.952065] ? save_stack+0x45/0xd0
[ 183.952477] ? __kasan_kmalloc.constprop.0+0xcf/0xe0
[ 183.953053] ? kasan_kmalloc+0x9/0x10
[ 183.953485] ? __kmalloc+0x15c/0x740
[ 183.953918] ? iter_file_splice_write+0x267/0xfc0
[ 183.954466] ? splice_direct_to_actor+0x3be/0x9d0
[ 183.955015] ? do_splice_direct+0x2c7/0x420
[ 183.955504] ? do_sendfile+0x61d/0xe60
[ 183.955948] ? __x64_sys_sendfile64+0x15a/0x240
[ 183.956475] ? do_syscall_64+0x1a3/0x800
[ 183.956939] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 183.957550] ? debug_lockdep_rcu_enabled+0x71/0xa0
[ 183.958115] ? common_file_perm+0x231/0x800
[ 183.958604] ? print_usage_bug+0xd0/0xd0
[ 183.959066] do_iter_readv_writev+0x902/0xbc0
[ 183.959582] ? vfs_dedupe_file_range+0x780/0x780
[ 183.960122] ? apparmor_file_permission+0x25/0x30
[ 183.960674] ? rw_verify_area+0x118/0x360
[ 183.961146] do_iter_write+0x184/0x610
[ 183.961587] ? pipe_to_sendpage+0x390/0x390
[ 183.962087] ? rcu_read_lock_sched_held+0x110/0x130
[ 183.962658] ? __kmalloc+0x5d5/0x740
[ 183.963082] vfs_iter_write+0x77/0xb0
[ 183.963515] iter_file_splice_write+0x885/0xfc0
[ 183.964044] ? fsnotify+0x4f5/0xed0
[ 183.964462] ? page_cache_pipe_buf_steal+0x800/0x800
[ 183.965046] ? rw_verify_area+0x118/0x360
[ 183.965516] ? page_cache_pipe_buf_steal+0x800/0x800
[ 183.966098] direct_splice_actor+0x126/0x1a0
[ 183.966598] splice_direct_to_actor+0x3be/0x9d0
[ 183.967125] ? generic_pipe_buf_nosteal+0x10/0x10
[ 183.967679] ? do_splice_to+0x190/0x190
[ 183.968134] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 183.968762] ? rw_verify_area+0x118/0x360
[ 183.969233] do_splice_direct+0x2c7/0x420
[ 183.969709] ? splice_direct_to_actor+0x9d0/0x9d0
[ 183.970264] ? rcu_read_lock_sched_held+0x110/0x130
[ 183.970833] ? rcu_sync_lockdep_assert+0x73/0xb0
[ 183.971376] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 183.972003] ? __sb_start_write+0x1aa/0x360
[ 183.972499] do_sendfile+0x61d/0xe60
[ 183.972926] ? do_compat_pwritev64+0x1c0/0x1c0
[ 183.973453] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 183.974084] ? _copy_from_user+0xdd/0x150
[ 183.974561] __x64_sys_sendfile64+0x15a/0x240
[ 183.975080] ? __ia32_sys_sendfile+0x2a0/0x2a0
[ 183.975605] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 183.976151] do_syscall_64+0x1a3/0x800
[ 183.976598] ? syscall_return_slowpath+0x5f0/0x5f0
[ 183.977161] ? prepare_exit_to_usermode+0x3b0/0x3b0
[ 183.977733] ? __switch_to_asm+0x34/0x70
[ 183.978204] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 183.978758] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 183.979348] RIP: 0033:0x457629
[ 183.979714] Code: 8d b5 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66
90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 0f 83 5b b5 fb ff c3 66 2e 0f 1f 84 00 00
00 00
[ 183.981845] RSP: 002b:00007f435f197c88 EFLAGS: 00000246 ORIG_RAX:
0000000000000028
[ 183.982718] RAX: ffffffffffffffda RBX: 000000000071bfa0 RCX: 0000000000457629
[ 183.983533] RDX: 0000000020000000 RSI: 0000000000000003 RDI: 0000000000000003
[ 183.984358] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
[ 183.985179] R10: 00008080fffffffe R11: 0000000000000246 R12: 00007f435f1986d4
[ 183.986005] R13: 00000000004abf30 R14: 00000000006eb8b8 R15: 00000000ffffffff



This is with qemu with 4 CPUs:

qemu-system-x86_64 -hda wheezy.img -net
user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic -nographic -kernel
arch/x86/boot/bzImage -append "kvm-intel.nested=1
kvm-intel.unrestricted_guest=1 kvm-intel.ept=1
kvm-intel.flexpriority=1 kvm-intel.vpid=1
kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1
kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1
kvm-intel.enable_apicv=1 console=ttyS0 root=/dev/sda
earlyprintk=serial slub_debug=UZ vsyscall=native rodata=n oops=panic
panic_on_warn=1 panic=86400 ima_policy=tcb" -enable-kvm -pidfile
vm_pid -m 2G -smp 4 -cpu host


There is a bunch of other bug reports about hangs where reproducers
mention perf_event_open and sched_setattr.