Re: [RFC PATCH v4] x86/kdump: terminate watchdog NMI interrupt to avoid kdump crashes

From: Zeng Heng
Date: Wed Feb 22 2023 - 22:14:16 EST



在 2023/2/23 10:29, Zeng Heng 写道:

在 2023/2/23 2:39, Eric W. Biederman 写道:
Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:

On Fri, Feb 17, 2023 at 08:06:04PM +0800, Zeng Heng wrote:
If the cpu panics within the NMI interrupt context, there could be
unhandled NMI interrupts in the background which are blocked by processor
until next IRET instruction executes. Since that, it prevents nested
NMI handler execution.

In case of IRET execution during kdump reboot and no proper NMIs handler
registered at that point (such as during EFI loader)
EFI loader?  kexec on panic is supposed to be kernel to kernel.
If someone is getting EFI involved that is a bug.

In kdump path, kexec would start purgatory to verify the secondary kernel by

sha256. If verify passed, it would turn the control to EFI loader, and call the second

kernel to capture the environment as vmcore file.

As the mail said, if panic appears within NMI context, we never exit from that until

EFI loader handles page fault exception and executes IRET instruction when exit

from PF. At this moment, processor would allow the blocked NMI interrupt raise.


This kills all of perf, including but not limited to the hardware
watchdog. However, it does nothing to external NMI sources like the NMI
button found on some HP machines.

Still I suppose it is sufficient for the normal case.
I can't think of one why we don't just leave
NMIs deliberately disabled

Inative_machine_crash_shutdown() has called lapic_shutdown() to disable any kind of

irq, but EFI loader assumes there is no any residual NMIs in the background.


Here is the first version for this issue:

https://lore.kernel.org/all/20230110102745.2514694-1-zengheng4@xxxxxxxxxx/

Zeng Heng


until the crash recover kernel figured out how to enable them safely.