Re: [PROBLEM] Frequently get "irq 31: nobody cared" when passing through 2x GPUs that share same pci switch via vfio

From: Matthew Ruffell
Date: Tue Oct 12 2021 - 18:35:37 EST


Hi Alex,

On Wed, Oct 13, 2021 at 9:05 AM Alex Williamson
<alex.williamson@xxxxxxxxxx> wrote:
> On Tue, 12 Oct 2021 17:58:07 +1300
> Matthew Ruffell <matthew.ruffell@xxxxxxxxxxxxx> wrote:
> > Nathan does have a small side effect to report:
> >
> > > The only thing close to an issue that I have is that I still get frequent
> > > "irq 112: nobody cared" and "Disabling IRQ #112" errors. They just no longer
> > > lockup the system. If I watch the reproducer time between VM resets, I've
> > > noticed that it takes longer for the VM to startup after one of these
> > > "nobody cared" errors, and thus it takes longer until I can reset the VM again.
> > > I believe slow guest behavior in this disabled IRQ scenario is expected though?
>
> The device might need to be operating in INTx mode, or at least had
> been at some point, to get the register filled. It's essentially just
> a scratch register on the card that gets filled when the interrupt is
> configured.
>
> Each time we register a new handler for the irq the masking due to
> spurious interrupt will be removed, but if it's actually causing the VM
> boot to take longer that suggests to me that the guest driver is
> stalled, perhaps because it's expecting an interrupt that's now masked
> in the host. This could also be caused by a device that gets
> incorrectly probed for PCI-2.3 compliant interrupt masking. For
> probing we can really only test that we have the ability to set the
> DisINTx bit, we can only hope that the hardware folks also properly
> implemented the INTx status bit to indicate the device is signaling
> INTx. We should really figure out which device this is so that we can
> focus on whether it's another shared interrupt issue or something
> specific to the device.

Nathan got back to me and the device is that same GPU audio controller pair from
before, so it might be another shared interrupt issue, since they both
share 112.

$ sudo lspci -vvv | grep "IRQ 112" -B 5 -A 10
88:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio
Controller (rev a1)
Subsystem: eVga.com. Corp. TU102 High Definition Audio Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 112
NUMA node: 1
Region 0: Memory at f7080000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+
FLReset- SlotPowerLimit 25.000W
--
89:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio
Controller (rev a1)
Subsystem: eVga.com. Corp. TU102 High Definition Audio Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 112
NUMA node: 1
Region 0: Memory at f5080000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+
FLReset- SlotPowerLimit 25.000W

> I'm also confused why this doesn't trigger the same panic/kexec as we
> were seeing with the other interrupt lines. Are there some downstream
> patches or configs missing here that would promote these to more fatal
> errors?
>
There aren't any downstream patches, since the machine lockup happens with
regular mainline kernels too. Even without panic on oops set, the system will
grind to a halt and hang. The panic on oops sysctl was an attempt to get the
machine to reboot to the crashkernel and restart again, but it didn't work very
well since we get stuck copying the IR tables from DMAR.

But your patches seem to fix the hang, which is very promising.

Thanks,
Matthew