On Sat, Jan 13, 2024 at 10:17:13AM -0800, Eric Badger wrote:
In the kdump kernel, the IOMMU will operate in deferred_attach mode. In
this mode, info->domain may not yet be assigned by the time the
release_device function is called, which leads to the following crash in
the crashkernel:
This never worked right? But why are you getting to release in a crash
kernel in the first place, that seems kinda weird..
BUG: kernel NULL pointer dereference, address: 000000000000003c
...
RIP: 0010:do_raw_spin_lock+0xa/0xa0
...
_raw_spin_lock_irqsave+0x1b/0x30
intel_iommu_release_device+0x96/0x170
iommu_deinit_device+0x39/0xf0
__iommu_group_remove_device+0xa0/0xd0
iommu_bus_notifier+0x55/0xb0
notifier_call_chain+0x5a/0xd0
blocking_notifier_call_chain+0x41/0x60
bus_notify+0x34/0x50
device_del+0x269/0x3d0
pci_remove_bus_device+0x77/0x100
p2sb_bar+0xae/0x1d0
...
i801_probe+0x423/0x740
Signed-off-by: Eric Badger <ebadger@xxxxxxxxxxxxxxx>
It should have a fixes line, but I'm not sure what it is..
---
drivers/iommu/intel/iommu.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
Unfortunately this issue is likely systemic in all the drivers.
Something I've been thinking about for a while now is to have the
option for the core code release to set the driver to a specific
releasing domain (probably a blocking or identity global static)
A lot of drivers are open coding this internally in their release
functions, like Intel. But it really doesn't deserve to be special.