BUG: soft lockup in smp_call_function

From: Xianying Wang

Date: Thu Nov 20 2025 - 01:48:45 EST


Hi,

I hit a repeatable soft lockup in csd_lock_wait() via
smp_call_function_many_cond() while running a KVM guest with a
syzkaller workload. This soft lockup can be triggered by running the
attached C reproducer inside a KVM guest for some time. The reproducer
just loops perf_event_open() + ioctl(PERF_EVENT_IOC_REFRESH) +
socket(AF_INET6, ...) in a child process, while normal userspace
(systemd/journald) is running.This may be a soft lockup caused by an
incomplete cross-CPU TLB flush (smp_call_function_many_cond /
csd_lock_wait). The lockup occurs in csd_lock_wait() in kernel/smp.c
(inlined into smp_call_function_many_cond()), with the upper call
chain being flush_tlb_mm_range() → kvm_flush_tlb_multi(), triggered by
an ext4 fsync().

Since this is a KVM guest and syzkaller typically does a lot of
stressing, it looks like a possible race between kvm_flush_tlb_multi()
and CPU state (e.g. CPU hotplug / vCPU offlining or an incorrect
cpumask) in the paravirt TLB shootdown path, where one target CPU
never processes the IPI.

This can be reproduced on:

HEAD commit:

e5f0a698b34ed76002dc5cff3804a61c80233a7a

6fab32bb6508abbb8b7b1c5498e44f0c32320ed5

report: https://pastebin.com/raw/Lu4Tz2SH

console output :https://pastebin.com/raw/BxtNEXnq

console output v6.17.0:https://pastebin.com/raw/PBytK7Wq

kernel config : https://pastebin.com/raw/1grwrT16

C reproducer :https://pastebin.com/raw/ySCpMzk2

Let me know if you need more details or testing.

Best regards,

Xianying