Kernel-managed IRQ affinity (cont)

From: Peter Xu
Date: Mon Dec 16 2019 - 14:57:20 EST


Hi, Thomas,

(Sorry I must have lost the discussion during an email migration, so
I'll start with a new one)

This is a continued discussion of previous one on kernel managed IRQ
affinity [1]. I think at that time the conclusion is that we don't
have a usage scenario to change current policy [2]. However recently
I noticed that it is probably a very fundamental requirement for some
real-time scenarios, even when there's no multi-queue involved.

In my test case, it was a very common realtime guest with 10 vcpus,
0-1 are housekeeping vcpus, 2-9 are realtime vcpus. The guest has one
virtio-blk device as boot disk. With a distribution very close to
latest upstream, we can observe high spikes, probably due to the IRQs.

To guarantee realtime responsiveness, we need to make sure the IRQs
will be managable, say, when I run a real-time workload on vcpu9, we
should be able to move all the IRQs from vcpu9 to the other vcpus
(most probably vcpu0 and vcpu1). However with the kernel managed IRQs
we can't echo to /proc/irq/N/smp_affinity. Here, vcpu9 gets IRQ 38
from the virtio-blk device:

# cat /proc/interrupts | grep -w 38
38: 0 0 0 0 0 0 0 0 0 15206 PCI-MSI 2621441-edge virtio2-req.0
# cat /proc/irq/38/smp_affinity
3ff
# cat /proc/irq/38/effective_affinity
200

Meanwhile, I don't think there's anything special for VMs, so this
issue should exist even for hosts as long as the IRQ is managed in the
same way here as the virtio-blk device.

As Ming has mentioned in previous discussions [3], I think it would be
at least good if the kernel IRQ system can respect "irqaffinity=" when
assigning IRQs to the cores. Currently it's not. What would you
suggest in this case? Do you think this is a valid user scenario?

Thanks,

[1] https://lkml.org/lkml/2019/3/18/15
[2] https://lkml.org/lkml/2019/3/25/562
[3] https://lkml.org/lkml/2019/3/25/308

--
Peter Xu