Re: [RFC PATCH v4] x86/kdump: terminate watchdog NMI interrupt to avoid kdump crashes
From: Zeng Heng
Date: Wed Feb 22 2023 - 21:29:44 EST
在 2023/2/23 2:39, Eric W. Biederman 写道:
Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
On Fri, Feb 17, 2023 at 08:06:04PM +0800, Zeng Heng wrote:
If the cpu panics within the NMI interrupt context, there could be
unhandled NMI interrupts in the background which are blocked by processor
until next IRET instruction executes. Since that, it prevents nested
NMI handler execution.
In case of IRET execution during kdump reboot and no proper NMIs handler
registered at that point (such as during EFI loader)
EFI loader? kexec on panic is supposed to be kernel to kernel.
If someone is getting EFI involved that is a bug.
In kdump path, kexec would start purgatory to verify the secondary kernel by
sha256. If verify passed, it would turn the control to EFI loader, and
call the second
kernel to capture the environment as vmcore file.
As the mail said, if panic appears within NMI context, we never exit
from that until
EFI loader handles page fault exception and executes IRET instruction
when exit
from PF. At this moment, processor would allow the blocked NMI interrupt
raise.
This kills all of perf, including but not limited to the hardware
watchdog. However, it does nothing to external NMI sources like the NMI
button found on some HP machines.
Still I suppose it is sufficient for the normal case.
I can't think of one why we don't just leave
NMIs deliberately disabled
How to just leave NMIs disabled, could you explain it with more details ?
Zeng Heng
until the crash recover kernel figured out how to enable them safely.