Re: [PROBLEM] c5.metal on AWS fails to kexec after "PCI: Explicitly put devices into D0 when initializing"
From: Mario Limonciello
Date: Tue Jan 06 2026 - 01:06:37 EST
On 12/4/2025 11:31 PM, Mario Limonciello wrote:
On 12/4/2025 9:10 PM, Matthew Ruffell wrote:
Sorry accidentally sent the message.
The nvme was still in state 0 / PCI_D0:
[ 109.801025] mruffell: vendor: 1d0f, device: 61, state: 0
[ 109.819542] nvme 0000:90:00.0: mruffell: Current PCI device.
/sys/bus/pci/devices$ ll
lrwxrwxrwx 1 root root 0 Dec 4 23:24 0000:90:00.0@ ->
../../../devices/
pci0000:7a/0000:7a:02.0/0000:8d:00.0/0000:8e:01.0/0000:90:00.0
All of these devices are also state 0. Interesting.
I have a relatively ignorant question. Can you reproduce with kdump
and
a crash too?
I don't actually know if you configure kdump and then crash the kernel
(say magic sys-rq key), does pci_device_shutdown() get called in order
to do the kexec? Or because the kernel is already in a crash state is
there just a jump into the crash kernel image location?
I did check this. I triggered a crash with magic sysrq, and
pci_device_shutdown()
was never called. It never printed out my debug messages from
pci_device_shutdown(), instead it just oopsed and booted straight to
the crash
kernel.
Thanks,
Matthew
OK so to me we have two options that you proved both work.
1) Call pci_set_master() during startup.
2) Drop pci_clear_master() for the kexec case during shutdown.
I think we need comments from Bjorn here on which direction is safer
generally speaking.
Hi Bjorn,
Can you review this thread and provide some comments on which way you
want to go to fix this issue?
Here's a full link to the rest of the thread if you don't have it.
https://lore.kernel.org/linux-pci/CAKAwkKvmdKxRRA4cR=jJEdyadon6uKXe+aFXaGSe=PNSgwDf9g@xxxxxxxxxxxxxx/#t
Thanks,