Re: [PATCH] PCI/MSI: add iounmap in msix_capability_init()
From: Yuanhe Shu
Date: Wed Jun 10 2026 - 05:36:25 EST
On Mon, 16 Mar 2026 at 10:22:12 -0700, Guenter Roeck wrote:
> Does msix_setup_interrupts() already unmap this region? If it fails,
> msix_setup_interrupts() calls pci_free_msi_irqs(), which appears to already
> call iounmap(dev->msix_base) and sets dev->msix_base to NULL.
>
> If dev->msix_base was already set to NULL by pci_free_msi_irqs() during the
> msix_setup_interrupts() error path, does this result in iounmap(NULL)?
Confirmed. We hit this on Intel Emerald Rapids (192 CPUs) when running
tools/testing/selftests/kexec/test_kexec_jump.sh.
The commit message of 1a8d4c6ecb4c states:
"msix_capability_init() fails to unmap the MSI-X region if
msix_setup_interrupts() fails."
This is incorrect. There is no leak on this error path.
When msix_setup_interrupts() fails, the call chain is:
msix_setup_interrupts()
-> __msix_setup_interrupts()
struct pci_dev *dev __free(free_msi_irqs) = __dev;
...
return ret; // __free cleanup fires on error
The __free(free_msi_irqs) cleanup calls pci_free_msi_irqs(), which
already handles the unmap:
void pci_free_msi_irqs(struct pci_dev *dev)
{
pci_msi_teardown_msi_irqs(dev);
if (dev->msix_base) {
iounmap(dev->msix_base); // already unmapped here
dev->msix_base = NULL; // and set to NULL
}
}
So dev->msix_base is unmapped and NULLed before msix_setup_interrupts()
even returns to msix_capability_init(). The original "goto out_disable"
was correct -- it skipped iounmap because the cleanup was already done.
Commit 1a8d4c6ecb4c changed it to "goto out_unmap", adding a second
iounmap() call on an already-NULL pointer, which triggers:
WARNING: CPU#44 at iounmap+0x2a/0xe0
Workqueue: nvme-reset-wq nvme_reset_work [nvme]
RIP: 0010:iounmap+0x2a/0xe0
...
RDI: 0000000000000000
...
Call Trace:
<TASK>
msix_capability_init+0x317/0x3f0
__pci_enable_msix_range+0x21d/0x2c0
pci_alloc_irq_vectors_affinity+0xa9/0x130
nvme_setup_io_queues+0x2a8/0x420 [nvme]
? __pfx_nvme_calc_irq_sets+0x10/0x10 [nvme]
nvme_reset_work+0x151/0x340 [nvme]
process_one_work+0x197/0x3a0
worker_thread+0x1ab/0x320
? __pfx_worker_thread+0x10/0x10
RDI=0 confirms iounmap() is called with NULL.
This commit should probably be reverted. The out_unmap label and its
iounmap() are redundant with the __free(free_msi_irqs) scoped cleanup
in __msix_setup_interrupts(), which already takes care of unmapping
dev->msix_base on any error path.