Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq
From: Paul E. McKenney
Date: Tue Apr 16 2024 - 21:13:43 EST
On Tue, Apr 16, 2024 at 04:44:54PM +0800, Z qiang wrote:
> On Tue, Apr 16, 2024 at 4:10 PM Z qiang <qiang.zhang1211@xxxxxxxxx> wrote:
> >
> > Cc: Paul
> > >
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: c0b832517f62 Add linux-next specific files for 20240402
> > > git tree: linux-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=15f64776180000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=afcaf46d374cec8c
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=dce04ed6d1438ad69656
> > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f00471180000
> > >
> > > Downloadable assets:
> > > disk image: https://storage.googleapis.com/syzbot-assets/0d36ec76edc7/disk-c0b83251.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/6f9bb4e37dd0/vmlinux-c0b83251.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/2349287b14b7/bzImage-c0b83251.xz
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+dce04ed6d1438ad69656@xxxxxxxxxxxxxxxxxxxxxxxxx
> > >
> > > =============================
> > > WARNING: suspicious RCU usage
> > > 6.9.0-rc2-next-20240402-syzkaller #0 Not tainted
> > > -----------------------------
> > > kernel/rcu/tree.c:276 Illegal rcu_softirq_qs() in RCU read-side critical section!
> > >
> > > other info that might help us debug this:
> > >
> > >
> > > rcu_scheduler_active = 2, debug_locks = 1
> > > 1 lock held by ksoftirqd/0/16:
> > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
> > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_read_lock_sched include/linux/rcupdate.h:933 [inline]
> > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: pfn_valid include/linux/mmzone.h:2019 [inline]
> > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: __virt_addr_valid+0x183/0x520 arch/x86/mm/physaddr.c:65
> > >
> > > stack backtrace:
> > > CPU: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.9.0-rc2-next-20240402-syzkaller #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
> > > Call Trace:
> > > <IRQ>
> > > __dump_stack lib/dump_stack.c:88 [inline]
> > > dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
> > > lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712
> > > rcu_softirq_qs+0xd9/0x370 kernel/rcu/tree.c:273
> > > __do_softirq+0x5fd/0x980 kernel/softirq.c:568
Huh. This statement is supposed to prevent this call to __do_softirq()
from interrupt exit::
if (!IS_ENABLED(CONFIG_PREEMPT_RT) &&
__this_cpu_read(ksoftirqd) == current)
So was the ksoftirqd kthread interrupted at a point where it happens to
have softirq enabled?
Thanx, Paul
> > > invoke_softirq kernel/softirq.c:428 [inline]
> > > __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633
> > > irq_exit_rcu+0x9/0x30 kernel/softirq.c:645
> > > instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
> > > sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
> > > </IRQ>
> > > <TASK>
> > > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
> > > RIP: 0010:debug_lockdep_rcu_enabled+0xd/0x40 kernel/rcu/update.c:320
> > > Code: f5 90 0f 0b 90 90 90 eb c6 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 31 c0 83 3d c7 0f 28 04 00 <74> 1e 83 3d 26 42 28 04 00 74 15 65 48 8b 0c 25 c0 d3 03 00 31 c0
> > > RSP: 0018:ffffc90000157a50 EFLAGS: 00000202
> > > RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000001
> > > RDX: dffffc0000000000 RSI: ffffffff8bcae740 RDI: ffffffff8c1f7ec0
> > > RBP: dffffc0000000000 R08: ffffffff92f3a527 R09: 1ffffffff25e74a4
> > > R10: dffffc0000000000 R11: fffffbfff25e74a5 R12: 0000000029373578
> > > R13: 1ffff9200002af64 R14: ffffffff814220f3 R15: ffff88813fff90a0
> > > rcu_read_lock_sched include/linux/rcupdate.h:934 [inline]
> > > pfn_valid include/linux/mmzone.h:2019 [inline]
> > > __virt_addr_valid+0x1a9/0x520 arch/x86/mm/physaddr.c:65
> > > kasan_addr_to_slab+0xd/0x80 mm/kasan/common.c:37
> > > __kasan_record_aux_stack+0x11/0xc0 mm/kasan/generic.c:526
> >
> >
> > This should be caused by the following commit:
> > d818cc76e2b4 ("kasan: Record work creation stack trace with interrupts enabled")
> >
> > Is it possible to make the rcu_softirq_qs() run only in ksoftirqd task?
>
> use rcu_softirq_qs_periodic() in run_ksoftirqd().
>
> >
> > Thanks
> > Zqiang
> >
> > > __call_rcu_common kernel/rcu/tree.c:3096 [inline]
> > > call_rcu+0x167/0xa70 kernel/rcu/tree.c:3200
> > > context_switch kernel/sched/core.c:5412 [inline]
> > > __schedule+0x17f0/0x4a50 kernel/sched/core.c:6746
> > > __schedule_loop kernel/sched/core.c:6823 [inline]
> > > schedule+0x14b/0x320 kernel/sched/core.c:6838
> > > smpboot_thread_fn+0x61e/0xa30 kernel/smpboot.c:160
> > > kthread+0x2f0/0x390 kernel/kthread.c:388
> > > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
> > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
> > > </TASK>
> > > ----------------
> > > Code disassembly (best guess):
> > > 0: f5 cmc
> > > 1: 90 nop
> > > 2: 0f 0b ud2
> > > 4: 90 nop
> > > 5: 90 nop
> > > 6: 90 nop
> > > 7: eb c6 jmp 0xffffffcf
> > > 9: 0f 1f 40 00 nopl 0x0(%rax)
> > > d: 90 nop
> > > e: 90 nop
> > > f: 90 nop
> > > 10: 90 nop
> > > 11: 90 nop
> > > 12: 90 nop
> > > 13: 90 nop
> > > 14: 90 nop
> > > 15: 90 nop
> > > 16: 90 nop
> > > 17: 90 nop
> > > 18: 90 nop
> > > 19: 90 nop
> > > 1a: 90 nop
> > > 1b: 90 nop
> > > 1c: 90 nop
> > > 1d: f3 0f 1e fa endbr64
> > > 21: 31 c0 xor %eax,%eax
> > > 23: 83 3d c7 0f 28 04 00 cmpl $0x0,0x4280fc7(%rip) # 0x4280ff1
> > > * 2a: 74 1e je 0x4a <-- trapping instruction
> > > 2c: 83 3d 26 42 28 04 00 cmpl $0x0,0x4284226(%rip) # 0x4284259
> > > 33: 74 15 je 0x4a
> > > 35: 65 48 8b 0c 25 c0 d3 mov %gs:0x3d3c0,%rcx
> > > 3c: 03 00
> > > 3e: 31 c0 xor %eax,%eax
> > >
> > >
> > > ---
> > > This report is generated by a bot. It may contain errors.
> > > See https://goo.gl/tpsmEJ for more information about syzbot.
> > > syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx.
> > >
> > > syzbot will keep track of this issue. See:
> > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > >
> > > If the report is already addressed, let syzbot know by replying with:
> > > #syz fix: exact-commit-title
> > >
> > > If you want syzbot to run the reproducer, reply with:
> > > #syz test: git://repo/address.git branch-or-commit-hash
> > > If you attach or paste a git patch, syzbot will apply it before testing.
> > >
> > > If you want to overwrite report's subsystems, reply with:
> > > #syz set subsystems: new-subsystem
> > > (See the list of subsystem names on the web dashboard)
> > >
> > > If the report is a duplicate of another one, reply with:
> > > #syz dup: exact-subject-of-another-report
> > >
> > > If you want to undo deduplication, reply with:
> > > #syz undup
> > >