[Regression] No IO interrupt is generated before CPU is offline

From: Ming Lei
Date: Thu Apr 16 2020 - 05:03:53 EST


Hi Thomas,

When I run test script [1] in KVM guest[2], and disk is virtio-scsi,
IO hang can be triggered easily. Most times, it can be reproduced
by running './cpuhotplug_io 400 /dev/sda' once, and sometimes it
needs one more run.

After I checked blk-mq debugfs log, I found these requests have
been queued to virtio-scsi hardware, but interrupts aren't be
generated.

The issue is firstly found when John and I test the patchset[3][4] for
draining IO in cpu hotplug handler before CPU and managed IRQ becomes
shudown. And IOs are found not completed even though the CPU responsible
for dealing with this hw queue is still online, but going to shutdown.

git-bisect shows that the issue is introduced by the following commit:

60dcaad5736f ("x86/hotplug: Silence APIC and NMI when CPU is dead")


The issue can't be triggered any more after applying the following change:

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 69881b2d446c..c5e9f005fbb2 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1596,7 +1596,7 @@ int native_cpu_disable(void)
* it. It still responds normally to INIT, NMI, SMI, and SIPI
* messages.
*/
- apic_soft_disable();
+ clear_local_APIC();
cpu_disable_common();

return 0;


[1] test script
http://people.redhat.com/minlei/tests/tools/cpuhotplug_io

[2] virtio-scsi is MQ by passing 'num_queues=3' to qemu virtio-scsi
command line, meantime set cpu number as 8, so one queue can be covered
by more than one CPU

[3] https://lore.kernel.org/linux-block/20200407092901.314228-5-ming.lei@xxxxxxxxxx/

[4] latest patches for stop & drain IO before shutdown irq/cpu
https://github.com/ming1/linux/commits/v5.6-blk-mq-improve-cpu-hotplug



Thanks,
Ming