Re: PCI MSI issue for maxcpus=1

From: Marc Zyngier
Date: Sat Mar 05 2022 - 10:40:34 EST


[+ David, who was chasing something similar]

Hi John,

On Fri, 04 Mar 2022 12:53:31 +0000,
John Garry <john.garry@xxxxxxxxxx> wrote:
>
> > ...
>
> >
> > [ 7.961007]  valid_col+0x14/0x24
> > [ 7.964223]  its_send_single_command+0x4c/0x150
> > [ 7.968741]  its_irq_domain_activate+0xc8/0x104
> > [ 7.973259]  __irq_domain_activate_irq+0x5c/0xac
> > [ 7.977865]  __irq_domain_activate_irq+0x38/0xac
> > [ 7.982471]  irq_domain_activate_irq+0x3c/0x64
> > [ 7.986902]  __msi_domain_alloc_irqs+0x1a8/0x2f4
> > [ 7.991507]  msi_domain_alloc_irqs+0x20/0x2c
> > [ 7.995764]  __pci_enable_msi_range+0x2ec/0x590
> > [ 8.000284]  pci_alloc_irq_vectors_affinity+0xe0/0x140
> > [ 8.005410]  hisi_sas_v3_probe+0x300/0xbe0
> > [ 8.009494]  local_pci_probe+0x44/0xb0
> > [ 8.013232]  work_for_cpu_fn+0x20/0x34
> > [ 8.016969]  process_one_work+0x1d0/0x354
> > [ 8.020966]  worker_thread+0x2c0/0x470
> > [ 8.024703]  kthread+0x17c/0x190
> > [ 8.027920]  ret_from_fork+0x10/0x20
> > [ 8.031485] ---[ end trace bb67cfc7eded7361 ]---
> >
>
> ...
>
> > Ah, of course. the CPU hasn't booted yet, so its collection isn't
> > mapped. I was hoping that the core code would keep the interrupt in
> > shutdown state, but it doesn't seem to be the case...
> >
> > > Apart from this, I assume that if another cpu comes online later in
> > > the affinity mask I would figure that we want to target the irq to
> > > that cpu (which I think we would not do here).
> >
> > That's probably also something that should come from core code, as
> > we're not really in a position to decide this in the ITS driver.
> > .
>
>
> Hi Marc,
>
> Have you had a chance to consider this issue further?
>
> So I think that x86 avoids this issue as it uses matrix.c, which
> handles CPUs being offline when selecting target CPUs for managed
> interrupts.
>
> So is your idea still that core code should keep the interrupt in
> shutdown state (for no CPUs online in affinity mask)?

Yup. I came up with this:

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 2bdfce5edafd..97e9eb9aecc6 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -823,6 +823,19 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
if (!(vflags & VIRQ_ACTIVATE))
return 0;

+ if (!(vflags & VIRQ_CAN_RESERVE)) {
+ /*
+ * If the interrupt is managed but no CPU is available
+ * to service it, shut it down until better times.
+ */
+ if (irqd_affinity_is_managed(irqd) &&
+ !cpumask_intersects(irq_data_get_affinity_mask(irqd),
+ cpu_online_mask)) {
+ irqd_set_managed_shutdown(irqd);
+ return 0;
+ }
+ }
+
ret = irq_domain_activate_irq(irqd, vflags & VIRQ_CAN_RESERVE);
if (ret)
return ret;

With this in place, I get the following results (VM booted with 4
vcpus and maxcpus=1, the virtio device is using managed interrupts):

root@debian:~# cat /proc/interrupts
CPU0
10: 2298 GICv3 27 Level arch_timer
12: 84 GICv3 33 Level uart-pl011
49: 0 GICv3 41 Edge ACPI:Ged
50: 0 ITS-MSI 16384 Edge virtio0-config
51: 2088 ITS-MSI 16385 Edge virtio0-req.0
52: 0 ITS-MSI 16386 Edge virtio0-req.1
53: 0 ITS-MSI 16387 Edge virtio0-req.2
54: 0 ITS-MSI 16388 Edge virtio0-req.3
55: 11641 ITS-MSI 32768 Edge xhci_hcd
56: 0 ITS-MSI 32769 Edge xhci_hcd
IPI0: 0 Rescheduling interrupts
IPI1: 0 Function call interrupts
IPI2: 0 CPU stop interrupts
IPI3: 0 CPU stop (for crash dump) interrupts
IPI4: 0 Timer broadcast interrupts
IPI5: 0 IRQ work interrupts
IPI6: 0 CPU wake-up interrupts
Err: 0
root@debian:~# echo 1 >/sys/devices/system/cpu/cpu2/online
root@debian:~# cat /proc/interrupts
CPU0 CPU2
10: 2530 90 GICv3 27 Level arch_timer
12: 103 0 GICv3 33 Level uart-pl011
49: 0 0 GICv3 41 Edge ACPI:Ged
50: 0 0 ITS-MSI 16384 Edge virtio0-config
51: 2097 0 ITS-MSI 16385 Edge virtio0-req.0
52: 0 0 ITS-MSI 16386 Edge virtio0-req.1
53: 0 12 ITS-MSI 16387 Edge virtio0-req.2
54: 0 0 ITS-MSI 16388 Edge virtio0-req.3
55: 13487 0 ITS-MSI 32768 Edge xhci_hcd
56: 0 0 ITS-MSI 32769 Edge xhci_hcd
IPI0: 38 45 Rescheduling interrupts
IPI1: 3 3 Function call interrupts
IPI2: 0 0 CPU stop interrupts
IPI3: 0 0 CPU stop (for crash dump) interrupts
IPI4: 0 0 Timer broadcast interrupts
IPI5: 0 0 IRQ work interrupts
IPI6: 0 0 CPU wake-up interrupts
Err: 0

Would this solve your problem?

Thanks,

M.

--
Without deviation from the norm, progress is not possible.