Re: [syzbot] possible deadlock in scheduler_tick

From: Paolo Bonzini
Date: Wed Mar 24 2021 - 08:19:57 EST


On 24/03/21 12:34, Wanpeng Li wrote:
Cc David Woodhouse,
On Wed, 24 Mar 2021 at 18:11, syzbot
<syzbot+b282b65c2c68492df769@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

Hello,

syzbot found the following issue on:

HEAD commit: 1c273e10 Merge tag 'zonefs-5.12-rc4' of git://git.kernel.o..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=13c0414ed00000
kernel config: https://syzkaller.appspot.com/x/.config?x=6abda3336c698a07
dashboard link: https://syzkaller.appspot.com/bug?extid=b282b65c2c68492df769
userspace arch: i386
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17d86ad6d00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17b8497cd00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b282b65c2c68492df769@xxxxxxxxxxxxxxxxxxxxxxxxx

=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
5.12.0-rc3-syzkaller #0 Not tainted
-----------------------------------------------------
syz-executor030/8435 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
ffffc90001a2a230 (&kvm->arch.pvclock_gtod_sync_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
ffffc90001a2a230 (&kvm->arch.pvclock_gtod_sync_lock){+.+.}-{2:2}, at: get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587

and this task is already holding:
ffff8880b9d35198 (&rq->lock){-.-.}-{2:2}, at: rq_lock kernel/sched/sched.h:1321 [inline]
ffff8880b9d35198 (&rq->lock){-.-.}-{2:2}, at: __schedule+0x21c/0x21b0 kernel/sched/core.c:4990
which would create a new lock dependency:
(&rq->lock){-.-.}-{2:2} -> (&kvm->arch.pvclock_gtod_sync_lock){+.+.}-{2:2}

but this new dependency connects a HARDIRQ-irq-safe lock:
(&rq->lock){-.-.}-{2:2}

... which became HARDIRQ-irq-safe at:
lock_acquire kernel/locking/lockdep.c:5510 [inline]
lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
rq_lock kernel/sched/sched.h:1321 [inline]
scheduler_tick+0xa4/0x4b0 kernel/sched/core.c:4538
update_process_times+0x191/0x200 kernel/time/timer.c:1801
tick_periodic+0x79/0x230 kernel/time/tick-common.c:100
tick_handle_periodic+0x41/0x120 kernel/time/tick-common.c:112
timer_interrupt+0x3f/0x60 arch/x86/kernel/time.c:57
__handle_irq_event_percpu+0x303/0x8f0 kernel/irq/handle.c:156
handle_irq_event_percpu kernel/irq/handle.c:196 [inline]
handle_irq_event+0x102/0x290 kernel/irq/handle.c:213
handle_level_irq+0x256/0x6e0 kernel/irq/chip.c:650
generic_handle_irq_desc include/linux/irqdesc.h:158 [inline]
handle_irq arch/x86/kernel/irq.c:231 [inline]
__common_interrupt+0x9e/0x200 arch/x86/kernel/irq.c:250
common_interrupt+0x9f/0xd0 arch/x86/kernel/irq.c:240
asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:623
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
_raw_spin_unlock_irqrestore+0x38/0x70 kernel/locking/spinlock.c:191
__setup_irq+0xc72/0x1ce0 kernel/irq/manage.c:1737
request_threaded_irq+0x28a/0x3b0 kernel/irq/manage.c:2127
request_irq include/linux/interrupt.h:160 [inline]
setup_default_timer_irq arch/x86/kernel/time.c:70 [inline]
hpet_time_init+0x28/0x42 arch/x86/kernel/time.c:82
x86_late_time_init+0x58/0x94 arch/x86/kernel/time.c:94
start_kernel+0x3ee/0x496 init/main.c:1028
secondary_startup_64_no_verify+0xb0/0xbb

to a HARDIRQ-irq-unsafe lock:
(&kvm->arch.pvclock_gtod_sync_lock){+.+.}-{2:2}

... which became HARDIRQ-irq-unsafe at:
...
lock_acquire kernel/locking/lockdep.c:5510 [inline]
lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:354 [inline]
kvm_synchronize_tsc+0x459/0x1230 arch/x86/kvm/x86.c:2332
kvm_arch_vcpu_postcreate+0x73/0x180 arch/x86/kvm/x86.c:10183
kvm_vm_ioctl_create_vcpu arch/x86/kvm/../../../virt/kvm/kvm_main.c:3239 [inline]
kvm_vm_ioctl+0x1b2d/0x2800 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3839
kvm_vm_compat_ioctl+0x125/0x230 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4052
__do_compat_sys_ioctl+0x1d3/0x230 fs/ioctl.c:842
do_syscall_32_irqs_on arch/x86/entry/common.c:77 [inline]
__do_fast_syscall_32+0x56/0x90 arch/x86/entry/common.c:140
do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:165
entry_SYSENTER_compat_after_hwframe+0x4d/0x5c

other info that might help us debug this:

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&kvm->arch.pvclock_gtod_sync_lock);
local_irq_disable();
lock(&rq->lock);
lock(&kvm->arch.pvclock_gtod_sync_lock);
<Interrupt>
lock(&rq->lock);


The offender is get_kvmclock_ns() which is called in the context
switch process. The bad commit is 30b5c851af7991ad0 ("KVM: x86/xen:
Add support for vCPU runstate information").


I'll send a patch, thanks.

Paolo