Re: lockdep splat due to klist iteration from atomic context in Intel IOMMU driver

From: Bart Van Assche
Date: Mon Aug 15 2022 - 09:33:14 EST


On 8/15/22 05:05, Lennert Buytenhek wrote:
On a build of 7ebfc85e2cd7 ("Merge tag 'net-6.0-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net"), with
CONFIG_INTEL_IOMMU_DEBUGFS enabled, I am seeing the lockdep splat
below when an I/O page fault occurs on a machine with an Intel
IOMMU in it.

The issue seems to be the klist iterator functions using
spin_*lock_irq*() but the klist insertion functions using
spin_*lock(), combined with the Intel DMAR IOMMU driver iterating
over klists from atomic (hardirq) context as of commit 8ac0b64b9735
("iommu/vt-d: Use pci_get_domain_bus_and_slot() in pgtable_walk()")
when CONFIG_INTEL_IOMMU_DEBUGFS is enabled, where
pci_get_domain_bus_and_slot() calls into bus_find_device() which
iterates over klists.

I found this commit from 2018:

commit 624fa7790f80575a4ec28fbdb2034097dc18d051
Author: Bart Van Assche <bvanassche@xxxxxxx>
Date: Fri Jun 22 14:54:49 2018 -0700

scsi: klist: Make it safe to use klists in atomic context

This commit switched lib/klist.c:klist_{prev,next} from
spin_{,un}lock() to spin_{lock_irqsave,unlock_irqrestore}(), but left
the spin_{,un}lock() calls in add_{head,tail}() untouched.

The simplest fix for this would be to switch lib/klist.c:add_{head,tail}()
over to use the IRQ-safe spinlock variants as well?

Another possibility would be to evaluate whether it is safe to revert commit 624fa7790f80 ("scsi: klist: Make it safe to use klists in atomic context"). That commit is no longer needed by the SRP transport driver since the legacy block layer has been removed from the kernel.

Thanks,

Bart.