Re: [PATCH rc v7 0/7] iommu/arm-smmu-v3: Fix device crash on kdump kernel
From: Mostafa Saleh
Date: Tue Jun 30 2026 - 09:18:30 EST
On Mon, Jun 29, 2026 at 11:15:33PM -0700, Nicolin Chen wrote:
> When transitioning to a kdump kernel, the primary kernel might have crashed
> while endpoint devices were actively bus-mastering DMA. Currently, the SMMU
> driver aggressively resets the hardware during probe by clearing CR0_SMMUEN
> and setting the Global Bypass Attribute (GBPA) to ABORT.
>
> In a kdump scenario, this aggressive reset is highly destructive:
> a) If GBPA is set to ABORT, in-flight DMA will be aborted, generating fatal
> PCIe AER or SErrors that may panic the kdump kernel
Can you please clarify more on those errors, what conditions will
trigger that?
For example, patch 4 disables the EVTQ to avoid events as there might
be a lot, why are they not fatal also?
> b) If GBPA is set to BYPASS, in-flight DMA targeting some IOVAs will bypass
> the SMMU and corrupt the physical memory at those 1:1 mapped IOVAs.
>
> To safely absorb in-flight DMA, the kdump kernel must leave SMMUEN=1 intact
> and avoid modifying STRTAB_BASE. This allows HW to continue translating in-
> flight DMA using the crashed kernel's page tables until the endpoint device
> drivers probe and quiesce their respective hardware.
>
> However, the ARM SMMUv3 architecture specification states that updating the
> SMMU_STRTAB_BASE register while SMMUEN == 1 is UNPREDICTABLE or ignored.
>
> This leaves a kdump kernel no choice but to adopt the stream table from the
> crashed kernel.
In many cases the patches assume that the CDs/STE might be corrupted,
but still attempt to retrieve them with some validation
(log2size/split...)
However, the base address might be broken, TLBs state is unknown...
IMO, although that might improve the status quo, there are still
heuristics, in addition to noticeable complexity to transition the
stream tables. I wonder if FW can deal with AER in that case before
booting the kdump kernel.
Thanks,
Mostafa
>
> In this series:
> - Introduce an ARM_SMMU_OPT_KDUMP_ADOPT
> - Skip SMMUEN and STRTAB_BASE resets in arm_smmu_device_reset()
> - Skip EVENTQ/PRIQ setup including interrupts and their handlers
> - Memremap the crashed kernel's stream tables into the kdump kernel [*]
> - Defer any default domain attachment to retain STEs until device drivers
> explicitly request it.
>
> [*] For verification reasons, this series only fixes coherent SMMUs.
>
> For non-ARM_SMMU_OPT_KDUMP_ADOPT cases, keep a status quo since the commit
> 3f54c447df34f ("iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel"):
> full reset followed by driver-initiated reattach, potentially rejecting any
> in-flight DMA.
>
> Note that the series requires Jason's work that was merged in v6.12: commit
> 85196f54743d ("iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg").
> I have a backported version that is verified with a v6.8 kernel. I can send
> if we see a strong need after this version is accepted.
>
> This is on Github:
> https://github.com/nicolinc/iommufd/commits/smmuv3_kdump-v7
>
> Changelog
> v7
> * Rebase v7.2-rc1
> * Add Reviewed-by from Pranjal
> * Reword the linear stream table adoption comment
> * Use dev_dbg for the stream table adoption message
> * Document why the lazy L2 adoption uses devm_memremap()
> * Drop redundant FEAT_COHERENCY checks in the adopt functions
> * Use feature bit instead of STRTAB_BASE_CFG in adopt cleanup
> * Skip CR0_ATSCHK update in adopt mode to retain the crashed policy
> * Restore FEAT_2_LVL_STRTAB if the cleanup action fails to register
> v6
> https://lore.kernel.org/all/cover.1779265413.git.nicolinc@xxxxxxxxxx/
> * Rebase v7.1-rc3
> * Add Reviewed-by from Jason
> * Replace dma_addr_t with phys_addr_t
> * Drop arm_smmu_kdump_phys_is_corrupted()
> * Skip threaded IRQ handlers for EVTQ and PRIQ
> * Bypass arm_smmu_rmr_install_bypass_ste() in kdump case
> * Drop devm_ for adopt-time allocations; set up cleanup function via
> devm_add_action_or_reset()
> v5
> https://lore.kernel.org/all/cover.1778416609.git.nicolinc@xxxxxxxxxx/
> * Add Reviewed-by from Kevin
> * Drop READ_ONCE on lazy-attach L1 read
> * Split "Skip EVTQ/PRIQ setup" into two patches
> * Tighten kdump probe comment and dev_warn message
> * Use MEM + BUSY in arm_smmu_kdump_phys_is_corrupted
> v4
> https://lore.kernel.org/all/cover.1777446969.git.nicolinc@xxxxxxxxxx/
> * Rebase v7.1-rc1
> * s/arm_smmu_adopt/arm_smmu_kdump_adopt
> * Revert alloc/memremap/fmt on fallback
> * Reorder patches to avoid bisect regression
> * Use IRQ_NONE for spurious evtq/priq entries
> * Cap linear log2size by kdump's allocation bound
> * Defer clearing FEAT_2_LVL_STRTAB on linear adopt
> * Add arm_smmu_kdump_phys_is_corrupted() validation
> * Defer l2 stream table memremap till master inserts
> * Re-validate L1 desc on master insert with READ_ONCE
> v3
> https://lore.kernel.org/all/cover.1777150307.git.nicolinc@xxxxxxxxxx/
> * s/OPT_KDUMP/OPT_KDUMP_ADOPT
> * Do not adopt if GERROR_SFM_ERR
> * Retain CR0_ATSCHK beside CR0_SMMUEN
> * Clear latched GERROR bits (e.g. CMDQ_ERR)
> * Assert ARM_SMMU_FEAT_COHERENCY in adopt functions
> * Add STE.Cfg check in arm_smmu_is_attach_deferred()
> * Fix validations on return codes from devm_memremap()
> * Sanitize crashed kernel register values in adopt functions
> * Drop unnecessary l2ptrs guard in arm_smmu_is_attach_deferred()
> * Don't enable PRIQ/EVTQ irqs and guard the irq functions for combined
> irq cases
> v2
> https://lore.kernel.org/all/cover.1776286352.git.nicolinc@xxxxxxxxxx/
> * Add warning in non-coherent SMMU cases
> * Keep eventq/priq disabled vs. enabling-and-disabling-later
> * Check KDUMP option in the beginning of arm_smmu_device_reset()
> * Validate STRTAB format matches HW capability instead of forcing flags
> v1:
> https://lore.kernel.org/all/cover.1775763475.git.nicolinc@xxxxxxxxxx/
>
> Nicolin Chen (7):
> iommu/arm-smmu-v3: Add arm_smmu_kdump_adopt_strtab() for kdump
> iommu/arm-smmu-v3: Implement is_attach_deferred() for kdump
> iommu/arm-smmu-v3: Do not enable EVTQ/PRIQ interrupts in kdump kernel
> iommu/arm-smmu-v3: Skip EVTQ/PRIQ setup in kdump kernel
> iommu/arm-smmu-v3: Retain CR0_SMMUEN during kdump device reset
> iommu/arm-smmu-v3: Skip RMR bypass for kdump adoption
> iommu/arm-smmu-v3: Detect ARM_SMMU_OPT_KDUMP_ADOPT in probe()
>
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 467 ++++++++++++++++++--
> 2 files changed, 422 insertions(+), 46 deletions(-)
>
> --
> 2.43.0
>