[PATCH 0/1] genirq/cpuhotplug: fix CPU hotplug set affinity failure issue

From: Dongli Zhang
Date: Thu Apr 18 2024 - 21:36:03 EST


Please refer to the commit message of the patch for details.

The cover letter is to demonstrate how to reproduce the issue on purpose with
QEMU/KVM + virtio-net (that's why virtualization@xxxxxxxxxxxxxxx is CCed).

Thank you very much!

------------------------

1. Build the mainline linux kernel.

$ make defconfig
$ scripts/config --file ".config" -e CONFIG_X86_X2APIC \
-e CONFIG_GENERIC_IRQ_DEBUGFS
$ make olddefconfig
$ make -j24 > /dev/null

Confirm the config is enabled.

$ cat .config | grep CONFIG_GENERIC_IRQ_DEBUGFS
CONFIG_GENERIC_IRQ_DEBUGFS=y


2. Create the VM with the below QEMU command line. The libvirt virbr0 is used
as bridge for virtio-net.

-------------------
$ cat qemu-ifup
#!/bin/sh
# Script to bring a network (tap) device for qemu up.

br="virbr0"
ifconfig $1 up
brctl addif $br "$1"
exit
-------------------

/home/zhang/kvm/qemu-8.2.0/build/qemu-system-x86_64 \
-hda ubuntu2204.qcow2 -m 8G -smp 32 -vnc :5 -enable-kvm -cpu host \
-net nic -net user,hostfwd=tcp::5025-:22 \
-device virtio-net-pci,netdev=tapnet01,id=net01,mac=01:54:00:12:34:56,bus=pci.0,addr=0x4,mq=true,vectors=257 \
-netdev tap,id=tapnet01,ifname=tap01,script=qemu-ifup,downscript=no,queues=128,vhost=off \
-device virtio-net-pci,netdev=tapnet02,id=net02,mac=02:54:00:12:34:56,bus=pci.0,addr=0x5,mq=true,vectors=257 \
-netdev tap,id=tapnet02,ifname=tap02,script=qemu-ifup,downscript=no,queues=128,vhost=off \
-kernel /home/zhang/img/debug/mainline-linux/arch/x86_64/boot/bzImage \
-append "root=/dev/sda3 init=/sbin/init text loglevel=7 console=ttyS0" \
-serial stdio -name debug-threads=on


3. Use procfs to confirm the virtio IRQ numbers.

$ cat /proc/interrupts | grep virtio
24: ... ... PCI-MSIX-0000:00:04.0 0-edge virtio0-config
25: ... ... PCI-MSIX-0000:00:04.0 1-edge virtio0-input.0
.. ...
537: ... ... PCI-MSIX-0000:00:05.0 256-edge virtio1-output.127

Reset the affinity of IRQs 25-537 to CPUs=2,3.

-------------------
#!/bin/sh

for irq in {25..537}
do
echo $irq
echo 2,3 > /proc/irq/$irq/smp_affinity_list
cat /proc/irq/$irq/smp_affinity_list
cat /proc/irq/$irq/effective_affinity_list
echo ""
done
-------------------

Now offline CPU=8-31.

-------------------
#!/bin/sh

for cpu in {8..31}
do
echo $cpu
echo 0 > /sys/devices/system/cpu/cpu$cpu/online
done
-------------------


The below is the current VECTOR debugfs.

# cat /sys/kernel/debug/irq/domains/VECTOR
name: VECTOR
size: 0
mapped: 529
flags: 0x00000103
Online bitmaps: 8
Global available: 1090
Global reserved: 6
Total allocated: 536
System: 36: 0-19,21,50,128,236,243-244,246-255
| CPU | avl | man | mac | act | vectors
0 169 0 0 33 32-49,51-65
1 171 0 0 31 32-49,51-63
2 26 0 0 176 32-49,52-127,129-210
3 27 0 0 175 32-49,51-127,129-171,173-209
4 175 0 0 27 32-49,51-59
5 175 0 0 27 32-49,51-59
6 172 0 0 30 32-49,51-62
7 175 0 0 27 32-49,51-59


4. Now offline CPU=3.

# echo 0 > /sys/devices/system/cpu/cpu3/online

There are below from dmesg.

[ 96.234045] IRQ151: set affinity failed(-28).
[ 96.234064] IRQ156: set affinity failed(-28).
[ 96.234078] IRQ158: set affinity failed(-28).
[ 96.234091] IRQ159: set affinity failed(-28).
[ 96.234105] IRQ161: set affinity failed(-28).
[ 96.234118] IRQ162: set affinity failed(-28).
[ 96.234132] IRQ163: set affinity failed(-28).
[ 96.234145] IRQ164: set affinity failed(-28).
[ 96.234159] IRQ165: set affinity failed(-28).
[ 96.234172] IRQ166: set affinity failed(-28).
[ 96.235013] IRQ fixup: irq 339 move in progress, old vector 48
[ 96.237129] smpboot: CPU 3 is now offline


Although other CPUs have many available vectors, only CPU=2 is used.

# cat /sys/kernel/debug/irq/domains/VECTOR
name: VECTOR
size: 0
mapped: 529
flags: 0x00000103
Online bitmaps: 7
Global available: 1022
Global reserved: 6
Total allocated: 533
System: 36: 0-19,21,50,128,236,243-244,246-255
| CPU | avl | man | mac | act | vectors
0 168 0 0 34 32-49,51-53,55-57,59-68
1 165 0 0 37 32-49,51-57,59-60,64-73
2 0 0 0 202 32-49,51-127,129-235
4 173 0 0 29 32-40,42-48,52-63,65
5 171 0 0 31 32-49,51-54,56,58-62,64-66
6 172 0 0 30 32-49,51-52,54-57,59-63,65
7 173 0 0 29 32-49,51-52,54-58,60-62,64


Dongli Zhang