Re: [PATCH] PCI/MSI: add iounmap in msix_capability_init()
From: Bjorn Helgaas
Date: Wed Jun 10 2026 - 15:44:20 EST
On Wed, Jun 10, 2026 at 05:19:51PM +0800, Yuanhe Shu wrote:
> On Mon, 16 Mar 2026 at 10:22:12 -0700, Guenter Roeck wrote:
> > Does msix_setup_interrupts() already unmap this region? If it fails,
> > msix_setup_interrupts() calls pci_free_msi_irqs(), which appears to already
> > call iounmap(dev->msix_base) and sets dev->msix_base to NULL.
> >
> > If dev->msix_base was already set to NULL by pci_free_msi_irqs() during the
> > msix_setup_interrupts() error path, does this result in iounmap(NULL)?
>
> Confirmed. We hit this on Intel Emerald Rapids (192 CPUs) when running
> tools/testing/selftests/kexec/test_kexec_jump.sh.
>
> The commit message of 1a8d4c6ecb4c states:
>
> "msix_capability_init() fails to unmap the MSI-X region if
> msix_setup_interrupts() fails."
>
> This is incorrect. There is no leak on this error path.
>
> When msix_setup_interrupts() fails, the call chain is:
>
> msix_setup_interrupts()
> -> __msix_setup_interrupts()
> struct pci_dev *dev __free(free_msi_irqs) = __dev;
> ...
> return ret; // __free cleanup fires on error
>
> The __free(free_msi_irqs) cleanup calls pci_free_msi_irqs(), which
> already handles the unmap:
>
> void pci_free_msi_irqs(struct pci_dev *dev)
> {
> pci_msi_teardown_msi_irqs(dev);
> if (dev->msix_base) {
> iounmap(dev->msix_base); // already unmapped here
> dev->msix_base = NULL; // and set to NULL
> }
> }
>
> So dev->msix_base is unmapped and NULLed before msix_setup_interrupts()
> even returns to msix_capability_init(). The original "goto out_disable"
> was correct -- it skipped iounmap because the cleanup was already done.
>
> Commit 1a8d4c6ecb4c changed it to "goto out_unmap", adding a second
> iounmap() call on an already-NULL pointer, which triggers:
>
> WARNING: CPU#44 at iounmap+0x2a/0xe0
> Workqueue: nvme-reset-wq nvme_reset_work [nvme]
> RIP: 0010:iounmap+0x2a/0xe0
> ...
> RDI: 0000000000000000
> ...
> Call Trace:
> <TASK>
> msix_capability_init+0x317/0x3f0
> __pci_enable_msix_range+0x21d/0x2c0
> pci_alloc_irq_vectors_affinity+0xa9/0x130
> nvme_setup_io_queues+0x2a8/0x420 [nvme]
> ? __pfx_nvme_calc_irq_sets+0x10/0x10 [nvme]
> nvme_reset_work+0x151/0x340 [nvme]
> process_one_work+0x197/0x3a0
> worker_thread+0x1ab/0x320
> ? __pfx_worker_thread+0x10/0x10
>
> RDI=0 confirms iounmap() is called with NULL.
>
> This commit should probably be reverted. The out_unmap label and its
> iounmap() are redundant with the __free(free_msi_irqs) scoped cleanup
> in __msix_setup_interrupts(), which already takes care of unmapping
> dev->msix_base on any error path.
Thanks a lot for testing and verifying the issue!
I guess it's 1a8d4c6ecb4c ("PCI/MSI: Unmap MSI-X region on error")
that you think should be reverted? Can you or Haoxiang send a patch
to do that so this is unambiguous?
It looks like Thomas applied 1a8d4c6ecb4c, so I'll leave it up to him
to handle any revert.