Re: lockdep splat due to klist iteration from atomic context in Intel IOMMU driver

From: Baolu Lu
Date: Wed Aug 17 2022 - 00:45:38 EST

Next message: Krister Johansen: "Re: [PATCH] tracing/perf: Fix double put of trace event when init fails"
Previous message: Lukas Bulwahn: "[PATCH v2] xen: x86: remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY"
In reply to: Baolu Lu: "Re: lockdep splat due to klist iteration from atomic context in Intel IOMMU driver"
Next in thread: Lennert Buytenhek: "Re: lockdep splat due to klist iteration from atomic context in Intel IOMMU driver"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2022/8/15 21:32, Bart Van Assche wrote:

On 8/15/22 05:05, Lennert Buytenhek wrote:

On a build of 7ebfc85e2cd7 ("Merge tag 'net-6.0-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net"), with
CONFIG_INTEL_IOMMU_DEBUGFS enabled, I am seeing the lockdep splat
below when an I/O page fault occurs on a machine with an Intel
IOMMU in it.

The issue seems to be the klist iterator functions using
spin_*lock_irq*() but the klist insertion functions using
spin_*lock(), combined with the Intel DMAR IOMMU driver iterating
over klists from atomic (hardirq) context as of commit 8ac0b64b9735
("iommu/vt-d: Use pci_get_domain_bus_and_slot() in pgtable_walk()")
when CONFIG_INTEL_IOMMU_DEBUGFS is enabled, where
pci_get_domain_bus_and_slot() calls into bus_find_device() which
iterates over klists.

I found this commit from 2018:

    commit 624fa7790f80575a4ec28fbdb2034097dc18d051
    Author: Bart Van Assche <bvanassche@xxxxxxx>
    Date:   Fri Jun 22 14:54:49 2018 -0700

        scsi: klist: Make it safe to use klists in atomic context

This commit switched lib/klist.c:klist_{prev,next} from
spin_{,un}lock() to spin_{lock_irqsave,unlock_irqrestore}(), but left
the spin_{,un}lock() calls in add_{head,tail}() untouched.

The simplest fix for this would be to switch lib/klist.c:add_{head,tail}()
over to use the IRQ-safe spinlock variants as well?

Another possibility would be to evaluate whether it is safe to revert commit 624fa7790f80 ("scsi: klist: Make it safe to use klists in atomic context"). That commit is no longer needed by the SRP transport driver since the legacy block layer has been removed from the kernel.

If so, pci_get_domain_bus_and_slot() can not be used in this interrupt
context, right?

Best regards,
baolu

Next message: Krister Johansen: "Re: [PATCH] tracing/perf: Fix double put of trace event when init fails"
Previous message: Lukas Bulwahn: "[PATCH v2] xen: x86: remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY"
In reply to: Baolu Lu: "Re: lockdep splat due to klist iteration from atomic context in Intel IOMMU driver"
Next in thread: Lennert Buytenhek: "Re: lockdep splat due to klist iteration from atomic context in Intel IOMMU driver"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]