Re: [PATCH rc v6 1/7] iommu/arm-smmu-v3: Add arm_smmu_kdump_adopt_strtab() for kdump

From: Nicolin Chen

Date: Mon Jun 29 2026 - 00:02:19 EST

On Sun, Jun 28, 2026 at 11:00:43PM +0000, Pranjal Shrivastava wrote:
> On Wed, May 20, 2026 at 10:03:18AM -0700, Nicolin Chen wrote:
> > +static int arm_smmu_kdump_adopt_l2_strtab(struct arm_smmu_device *smmu, u32 sid,
> > + phys_addr_t base, u32 span,
> > + struct arm_smmu_strtab_l2 **l2table)
[...]
> > +
> > + /*
> > + * Retest the span in case the L1 descriptor has been overwritten since
> > + * the adopt. Reject this master's insert; panic or SMMU-disable would
> > + * either lose the vmcore or cascade aborts. Do not try to fix it, as it
> > + * would break all other SIDs in the same bus (PCI case). The corruption
> > + * blast radius is already bounded to that bus range.
> > + */
> > + if (span != STRTAB_SPLIT + 1) {
> > + dev_err(smmu->dev,
> > + "kdump: L1[%u] span %u changed since adopt (was %u)\n",
> > + arm_smmu_strtab_l1_idx(sid), span, STRTAB_SPLIT + 1);
> > + return -EINVAL;
> > + }
> > +
> > + size = (1UL << (span - 1)) * sizeof(struct arm_smmu_ste);
> > +
> > + table = devm_memremap(smmu->dev, base, size, MEMREMAP_WB);
>
> Why do we use devm_memremap() here but memremap() in the rest of the
> functions (even for the L1 ptr)?

Because it's called by arm_smmu_init_l2_strtab() that is a later
stage (arm_smmu_insert_master) than the other adopt functions.

This is deliberate. Otherwise, no one would undo memremap.

I can add a line of note here.

> > +static int arm_smmu_kdump_adopt_strtab_2lvl(struct arm_smmu_device *smmu,
> > + u32 cfg_reg, phys_addr_t base)
[...]
> > +
> > + cfg->l2.l2ptrs =
> > + kcalloc(num_l1_ents, sizeof(*cfg->l2.l2ptrs), GFP_KERNEL);
> > + if (!cfg->l2.l2ptrs)
> > + return -ENOMEM;
>
> We shuold ummap cfg->l2.l1tab before returning -ENOMEM here

No. The caller would call cleanup() on the -ENOMEM.

> > + if (span != STRTAB_SPLIT + 1) {
> > + dev_err(smmu->dev,
> > + "kdump: L1[%u] unsupported span %u (vs %u)\n",
> > + i, span, STRTAB_SPLIT + 1);
> > + return -EINVAL;
>
> We leak kcalloc'd mem (l2.l2ptrs) here, also we should unmap cfg->l2.l1tab

Ditto.

> > +static int arm_smmu_kdump_adopt_strtab_linear(struct arm_smmu_device *smmu,
> > + u32 cfg_reg, phys_addr_t base)
> > +{
> > + u32 log2size = FIELD_GET(STRTAB_BASE_CFG_LOG2SIZE, cfg_reg);
> > + struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> > + unsigned int max_log2size;
> > + size_t size;
> > +
> > + /*
> > + * Only a coherent SMMU is supported at this moment. For a non-coherent
> > + * SMMU that wants to support ARM_SMMU_OPT_KDUMP_ADOPT, try MEMREMAP_WC.
> > + */
> > + if (WARN_ON(!(smmu->features & ARM_SMMU_FEAT_COHERENCY)))
> > + return -EOPNOTSUPP;
> > +
> > + /* Cap the size at what the kdump kernel itself would have allocated */
> > + if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
> > + max_log2size =
> > + ilog2(STRTAB_MAX_L1_ENTRIES * STRTAB_NUM_L2_STES);
>
> Looks like we'd never hit this if condition because we'd never support a
> "linear" strtab if the HW supports ARM_SMMU_FEAT_2_LVL_STRTAB. Please
> see arm_smmu_init_strtab:

It's a defensive check that Sashiko recommended, as we cannot be sure
that the crashed kernel was using the same driver, which I think it's
not bad to have..

> > + size = cfg->linear.num_ents * sizeof(struct arm_smmu_ste);
> > + cfg->linear.table = memremap(base, size, MEMREMAP_WB);
> > + if (!cfg->linear.table)
> > + return -ENOMEM;
>
> We seem to skips initializing cfg->linear.ste_dma (it is populated in
> arm_smmu_init_strtab_linear)

Yes, deliberately since it's not used anywhere in kdump adopt mode.

> While the comment notes that arm_smmu_write_strtab() is bypassed in the
> kdump case, leaving cfg->linear.ste_dma uninitialized seems like a
> ticking time bomb if any other part of the driver ever uses it.

I can't think of a reason...

> > +static void arm_smmu_kdump_adopt_cleanup(void *data)
> > +{
> > + struct arm_smmu_device *smmu = data;
> > + u32 cfg_reg = readl_relaxed(smmu->base + ARM_SMMU_STRTAB_BASE_CFG);
>
> I'm worried about reading the HW register here, since this is a devres action, it
> can run after arm_smmu_device_remove() or arm_smmu_device_shutdown()
> (which would call rpm_put()). Please see __device_release_driver[1]:
>
> pm_runtime_put_sync(dev); <--- HW turned off
>
> device_remove(dev);
>
> if (dev->bus && dev->bus->dma_cleanup)
> dev->bus->dma_cleanup(dev);
>
> device_unbind_cleanup(dev); <--- This is where devm_release runs
> device_links_driver_cleanup(dev);
>
> Thus, even if we call rpm_get() here it would make no sense as the
> register contents would've been lost. Can we rely on some SW state
> here? smmu->features & 2LVL or maybe add an fmt in cfg?

Yes. Sashiko pointed it out too. We can do feature bit.

> > + * format, enforce the same format to match the adopted table.
> > + */
> > + ret = arm_smmu_kdump_adopt_strtab_linear(smmu, cfg_reg, base);
> > + if (!ret)
> > + smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB;
>
> IIRC, this should NOT be set if we selected the linear format.
> Looking at arm_smmu_init_strtab(), if the HW supports 2-level tables,
> the driver unconditionally selects it. What is the expected scenario
> where the previous kernel would have allocated a linear table on 2-level
> capable hardware? IMO, it is a bug if we see linear fmt with this
> feature set. Am I missing something?

Again, the crashed kernel doesn't have to use the same driver.

I will address the rest of your comments.

Thanks
Nicolin