Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

From: Hans de Goede
Date: Thu Feb 21 2019 - 07:28:39 EST


Hi,

On 19-02-19 22:01, Thomas Gleixner wrote:
Hans,

On Tue, 19 Feb 2019, Hans de Goede wrote:

Cc+: ACPI/AMD folks

Various people are reporting false positive "do_IRQ: #.55 No irq handler for
vector"
messages on AMD ryzen based laptops, see e.g.:

https://bugzilla.redhat.com/show_bug.cgi?id=1551605

Which contains this dmesg snippet:

Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up secondary CPUsHi,
...
Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP configuration:
Feb 07 20:14:29 localhost.localdomain kernel: .... node #0, CPUs: #1
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel: #2
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel: #3
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, 4 CPUs
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical packages: 1
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 processors
activated (15968.49 BogoMIPS)

It seems that we get an IRQ for each CPU as we bring it online,
which feels to me like it is some sorta false-positive.

Sigh, that looks like BIOS value add again.

It's not a false positive. Something _IS_ sending a vector 55 to these CPUs
for whatever reason.

I temporarily have access to a loaner laptop for a couple of weeks which shows
the same errors and I would like to fix this, but I don't really know how to
fix this.

Can you please enable CONFIG_GENERIC_IRQ_DEBUGFS and dig in the files there
whether vector 55 is used on CPU0 and which device is associated to that.

ls /sys/kernel/debug/irq/domains gives:

AMD-IR-0 IO-APIC-IR-0 PCI-MSI-3 default
AMD-IR-MSI-0-3 IO-APIC-IR-1 VECTOR

Non of the files under /sys/kernel/debug/irq/domains list 55 under the "vectors"
column of their output. The part with the vectors column is identical for all
of them and looks like this for all of them:

| CPU | avl | man | mac | act | vectors
0 195 1 1 6 33-37,48
1 195 1 1 6 33-38
2 195 1 1 6 33-38
3 195 1 1 6 33-38
4 195 1 1 6 33-38
5 195 1 1 6 33-38
6 195 1 1 6 33-38
7 195 1 1 6 33-38

cat /sys/kernel/debug/irq/irqs/55

Gives:

handler: handle_fasteoi_irq
device: (null)
status: 0x00004100
istate: 0x00000000
ddepth: 1
wdepth: 0
dstate: 0x0503a000
IRQD_LEVEL
IRQD_IRQ_DISABLED
IRQD_IRQ_MASKED
IRQD_SINGLE_TARGET
IRQD_MOVE_PCNTXT
IRQD_CAN_RESERVE
node: -1
affinity: 0-15
effectiv: 0
pending:
domain: IO-APIC-IR-1
hwirq: 0x0
chip: IR-IO-APIC
flags: 0x10
IRQCHIP_SKIP_SET_WAKE
parent:
domain: AMD-IR-0
hwirq: 0x10000
chip: AMD-IR
flags: 0x0
parent:
domain: VECTOR
hwirq: 0x37
chip: APIC
flags: 0x0
Vector: 0
Target: 0
move_in_progress: 0
is_managed: 0
can_reserve: 1
has_reserved: 1
cleanup_pending: 0

cat /proc/interrupt

Gives:

CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 123 0 0 0 0 0 0 0 IR-IO-APIC 2-edge timer
1: 0 0 0 0 0 0 188 0 IR-IO-APIC 1-edge i8042
8: 0 0 0 0 0 0 0 1 IR-IO-APIC 8-edge rtc0
9: 0 6564 0 0 0 0 0 0 IR-IO-APIC 9-fasteoi acpi
12: 0 0 0 0 0 511 0 0 IR-IO-APIC 12-edge i8042
25: 0 0 0 0 0 0 0 0 PCI-MSI 4096-edge AMD-Vi
26: 0 0 0 0 0 0 0 0 IR-PCI-MSI 18432-edge PCIe PME, aerdrv
27: 0 0 0 0 0 0 0 0 IR-PCI-MSI 20480-edge PCIe PME, aerdrv
28: 0 0 0 0 0 0 0 0 IR-PCI-MSI 22528-edge PCIe PME, aerdrv
29: 0 0 0 0 0 0 0 0 IR-PCI-MSI 24576-edge PCIe PME, aerdrv
30: 0 0 0 0 0 0 0 0 IR-PCI-MSI 26624-edge PCIe PME, aerdrv
31: 0 0 0 0 0 0 0 0 IR-PCI-MSI 28672-edge PCIe PME, aerdrv
32: 0 0 0 0 0 0 0 0 IR-PCI-MSI 133120-edge PCIe PME
33: 0 0 0 0 0 0 0 0 IR-PCI-MSI 135168-edge PCIe PME
35: 0 0 0 0 0 0 0 0 IR-PCI-MSI 4194304-edge ahci[0000:08:00.0]
36: 0 0 0 0 0 0 0 0 IR-IO-APIC 15-fasteoi ehci_hcd:usb1
38: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676160-edge xhci_hcd
39: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676161-edge xhci_hcd
40: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676162-edge xhci_hcd
41: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676163-edge xhci_hcd
42: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676164-edge xhci_hcd
43: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676165-edge xhci_hcd
44: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676166-edge xhci_hcd
45: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3676167-edge xhci_hcd
47: 0 0 0 0 0 623 0 0 IR-PCI-MSI 3678208-edge xhci_hcd
48: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3678209-edge xhci_hcd
49: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3678210-edge xhci_hcd
50: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3678211-edge xhci_hcd
51: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3678212-edge xhci_hcd
52: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3678213-edge xhci_hcd
53: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3678214-edge xhci_hcd
54: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3678215-edge xhci_hcd
56: 22 0 0 0 0 0 0 0 IR-PCI-MSI 524288-edge rtsx_pci
58: 0 37 0 0 0 0 0 0 IR-PCI-MSI 1572864-edge nvme0q0
59: 3838 0 0 0 0 0 0 0 IR-PCI-MSI 1572865-edge nvme0q1
60: 0 2036 0 0 0 0 0 0 IR-PCI-MSI 1572866-edge nvme0q2
61: 0 0 3525 0 0 0 0 0 IR-PCI-MSI 1572867-edge nvme0q3
62: 0 0 0 5013 0 0 0 0 IR-PCI-MSI 1572868-edge nvme0q4
63: 0 0 0 0 3025 0 0 0 IR-PCI-MSI 1572869-edge nvme0q5
64: 0 0 0 0 0 2271 0 0 IR-PCI-MSI 1572870-edge nvme0q6
65: 0 0 0 0 0 0 3948 0 IR-PCI-MSI 1572871-edge nvme0q7
66: 0 0 0 0 0 0 0 2094 IR-PCI-MSI 1572872-edge nvme0q8
67: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572873-edge nvme0q9
68: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572874-edge nvme0q10
69: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572875-edge nvme0q11
70: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572876-edge nvme0q12
71: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572877-edge nvme0q13
72: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572878-edge nvme0q14
73: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572879-edge nvme0q15
74: 0 0 0 0 0 0 0 0 IR-PCI-MSI 1572880-edge nvme0q16
75: 0 0 7598 0 0 0 0 0 IR-PCI-MSI 3670016-edge amdgpu
77: 0 0 0 0 0 0 0 0 IR-PCI-MSI 2097152-edge enp4s0f0
79: 0 0 0 0 0 0 0 0 IR-PCI-MSI 3145728-edge enp6s0
81: 0 0 0 527 0 0 0 0 IR-PCI-MSI 3672064-edge snd_hda_intel:card0
82: 0 0 0 0 930 0 0 0 IR-PCI-MSI 3682304-edge snd_hda_intel:card1
84: 0 0 0 0 0 15493 0 0 IR-PCI-MSI 1048576-edge r8822be
NMI: 2 1 1 1 1 1 1 1 Non-maskable interrupts
LOC: 55193 40080 52795 34289 48822 42298 57746 33306 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 2 1 1 1 1 1 1 1 Performance monitoring interrupts
IWI: 15286 10090 14311 9249 13054 23194 13384 9842 IRQ work interrupts
RTR: 0 0 0 0 0 0 0 0 APIC ICR read retries
RES: 26829 14012 14311 8544 12130 6480 13649 6414 Rescheduling interrupts
CAL: 15273 18572 16350 18090 14929 18234 17090 17644 Function call interrupts
TLB: 5771 5218 5098 5248 5571 3619 8354 5405 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts
DFR: 0 0 0 0 0 0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 5 5 5 5 5 5 5 5 Machine check polls
HYP: 0 0 0 0 0 0 0 0 Hypervisor callback interrupts
HRE: 0 0 0 0 0 0 0 0 Hyper-V reenlightenment interrupts
HVS: 0 0 0 0 0 0 0 0 Hyper-V stimer0 interrupts
ERR: 0
MIS: 0
PIN: 0 0 0 0 0 0 0 0 Posted-interrupt notification event
NPI: 0 0 0 0 0 0 0 0 Nested posted-interrupt event
PIW: 0 0 0 0 0 0 0 0 Posted-interrupt wakeup event

I bet its a legacy IRQ and as that space starts at 48 (IRQ0) this should be
IRQ9 which is usually - DRUMROLL - the ACPI interrupt.

The kernel clearly sets that up to be delivered to CPU 0 only, but I've
seen that before that the BIOS value add thinks that this setup is not
relevant.

/me goes off and sings LALALA

Regards,

Hans