Re: IRQ affinity not working on Xen pci-platform device

From: David Woodhouse
Date: Fri Mar 03 2023 - 11:56:19 EST


On Fri, 2023-03-03 at 17:51 +0100, Thomas Gleixner wrote:
> David!
>
> On Fri, Mar 03 2023 at 15:16, David Woodhouse wrote:
> > I added the 'xen_no_vector_callback' kernel parameter a while back
> > (commit b36b0fe96af) to ensure we could test that more for Linux
> > guests.
> >
> > Most of my testing at the time was done with just two CPUs, and I
> > happened to just test it with four. It fails, because the IRQ isn't
> > actually affine to CPU0.
> >
> > I tried making it work anyway (in line with the comment in platform-
> > pci.c which says that it shouldn't matter if it *runs* on CPU0 as long
> > as it processes events *for* CPU0). That didn't seem to work.
> >
> > If I put the irq_set_affinity() call *before* the request_irq() that
> > does actually work. But it's setting affinity on an IRQ it doesn't even
> > own yet.
>
> The core allows it for raisins. See below... :)
>
> > Test hacks below; this is testable with today's QEMU master (yay!) and:
> >
> >   qemu-system-x86_64 -display none -serial mon:stdio -smp 4 \
> >      -accel kvm,xen-version=0x4000a,kernel-irqchip=split \
> >      -kernel ~/git/linux/arch/x86/boot//bzImage \
> >      -append "console=ttyS0,115200 xen_no_vector_callback"
> >
> > ...
> >
> > [    0.577173] ACPI: \_SB_.LNKC: Enabled at IRQ 11
> > [    0.578149] The affinity mask was 0-3
> > [    0.579081] The affinity mask is 0-3 and the handler is on 2
> > [    0.580288] The affinity mask is 0 and the handler is on 2
>
> What happens is that once the interrupt is requested, the affinity
> setting is deferred to the first interrupt. See the marvelous dance in
> arch/x86/kernel/apic/msi.c::msi_set_affinity().
>
> If you do the setting before request_irq() then the startup will assign
> it to the target mask right away.
>
> Btw, you are using irq_get_affinity_mask(), which gives you the desired
> target mask. irq_get_effective_affinity_mask() gives you the real one.
>
> Can you verify that the thing moves over after the first interrupt or is
> that too late already?

It doesn't seem to move. The hack to just return IRQ_NONE if invoked on
CPU != 0 was intended to do just that. It's a level-triggered interrupt
so when the handler does nothing on the "wrong" CPU, it ought to get
invoked again on the *correct* CPU and actually work that time.

But no, as the above logs show, it gets invoked twice *both* on CPU2.

If I do the setting before request_irq() then it should assign it right
away (unless that IRQ was already in use? It's theoretically a shared
PCI INTx line). But even then, that would mean I'm messing with
affinity on an IRQ I haven't even requested yet and don't own?

Attachment: smime.p7s
Description: S/MIME cryptographic signature