Re: `do_IRQ: 1.55 No irq handler for vector` on ASRock E350M1

From: Kai Heng Feng
Date: Thu Mar 29 2018 - 03:21:37 EST


Hi Tom & Thomas,

On Feb 27, 2018, at 12:42 AM, Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:

On 2/26/2018 10:37 AM, Morton, Eric wrote:
Thomas,

Yazen dug out PLAT-21393 as sounding like this issue. I haven't had a chance to digest it.

Yes, internally to AMD, that was the bug that tracked the issue I was
referring to.

There's also another user [1] affected by this issue.

Ethernet r8169 failed to work after system suspend:

[ 150.944101] do_IRQ: 3.37 No irq handler for vector
[ 150.944105] r8169 0000:01:00.0 enp1s0: link down

It's a regression started from v4.15-rc5.

[1] https://bugs.launchpad.net/bugs/1752772

Kai-Heng


Thanks,
Tom

Eric

ïOn 2/26/18, 10:31 AM, "Borislav Petkov" <bp@xxxxxxxxx> wrote:

On Mon, Feb 26, 2018 at 10:14:10AM -0600, Tom Lendacky wrote:
On 2/24/2018 2:59 AM, Thomas Gleixner wrote:
On Sat, 24 Feb 2018, Paul Menzel wrote:
Am 23.02.2018 um 20:09 schrieb Borislav Petkov:
On Fri, Feb 23, 2018 at 07:18:34PM +0100, Thomas Gleixner wrote:
Borislav is seeing similar issues on larger AMD machines. The interrupt
seems to come from BIOS/microcode during bringup of secondary CPUs and we
have no idea why.

Paul, can you boot 4.14 and grep your dmesg for something like:

[ 0.000000] spurious 8259A interrupt: IRQ7. >
?

No, I do not see that. Please find the logs attached.

From your 4.14 log:

Feb 19 09:48:06.843173 kodi kernel: CPU 0 irqstacks, hard=e9b0a000 soft=e9b0c000
Feb 19 09:48:06.843216 kodi kernel: spurious 8259A interrupt: IRQ7.

I think I remember seeing something like this previously and it turned out
to be a BIOS bug. All the AP's were enabled to work with the legacy 8259
interrupt controller. In an SMP system, only one processor in the system
should be configured to handle legacy 8259 interrupts (ExtINT delivery
mode - see Intel's SDM, Volume 3, section 10.5.1, Delivery Mode). Once
the BIOS was fixed, the spurious interrupt message went away.

I believe at some point during UEFI, the APs were exposed to an ExtINT
interrupt. Since they were configured to handle ExtINT delivery mode and
interrupts were not yet enabled, the interrupt was left pending. When the
APs were started by the OS and interrupts were enabled, the interrupt
triggered. Since the original pending interrupt was handled by the BSP,
there was no longer an interrupt actually pending, so the 8259 responds
with IRQ 7 when queried by the OS. This occurred for each AP.

Interesting - is this something that can happen on Zen too?

Because I have such reports too.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.