Re: [PATCH v3 1/3] iova: convert from rbtree to maple tree
From: Rik van Riel
Date: Thu Jun 18 2026 - 23:48:21 EST
On Wed, 2026-06-17 at 15:55 -0400, Liam R. Howlett wrote:
> On 26/06/02 11:35PM, Rik van Riel wrote:
> > From: Rik van Riel <riel@xxxxxxxx>
> >
> > - struct iova_domain replaces rb_root + cached_node +
> > cached32_node
> > + anchor with a single struct maple_tree. The iova_rbtree_lock
> > spinlock is renamed to iova_lock. The maple tree is initialized
> > with MT_FLAGS_ALLOC_RANGE (enables gap tracking for
> > mas_empty_area_rev) and MT_FLAGS_LOCK_EXTERN (uses the existing
> > iova_lock spinlock instead of the maple tree internal
> > lock)there .
>
> If the spinlock is only used to protect the tree, why are you using
> an
> external lock?
The IOVA code needs an irqsafe lock, but the lock also
protects some other things like iovad->max32_alloc_size,
and the failed-to-rebalance-deferred free list from patch 3.
Implementing something behind the MT_FLAGS_LOCK_IRQ
would address the first thing, but not the second.
>
> >
> > - /* Walk the tree backwards */
> > - spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> > + spin_lock_irqsave(&iovad->iova_lock, flags);
> > if (limit_pfn <= iovad->dma_32bit_pfn &&
> > size >= iovad->max32_alloc_size)
> > goto iova32_full;
>
> This check seems to ensure the request is either both 32bit safe or
> it's
> fine to pass. If a 64bit allocation uses a larger size and we fail
> later, aren't we overwriting the 32bit size safety in your alloc_fail
> label? It seems wrong to be changing the max32_alloc_size in the
> error
> recovery.
>
> This also could happen outside the spinlock? I'm not sure why it was
> inside to begin with.
Good eyes. There's a bug here. While the iova32_full
code already has a limit_pfn check, we could also end
up there because the GFP_ATOMIC allocation failed,
turning a temporary failure into a permanent one.
This will be fixed for v4.
>
> > + new_pfn = (mas.last - size + 1) & align_mask;
> > + if (new_pfn < mas.index || new_pfn < iovad->start_pfn)
> > + goto alloc_fail;
>
> Neither of these can ever happen..? You've searched for a gap that's
> the worst case alignment and now you're ensuring the offset off the
> end
> aligned isn't smaller than the start of the gap or the end of your
> search. I'm no LLM, I don't think this is possible.
I'll drop these for v4.
> > + spin_lock_irqsave(&iovad->iova_lock, flags);
> > + {
> > + MA_STATE(mas, &iovad->mtree, pfn_lo, pfn_hi);
>
> You might want to look at mas_init() instead of the extra tab block.
>
That is nicer. Thank you.
> > - /* We are here either because this is the first reserver
> > node
> > - * or need to insert remaining non overlap addr range
> > - */
> > - iova = __insert_new_range(iovad, pfn_lo, pfn_hi);
> > -finish:
> > + iova = alloc_and_init_iova(merged_lo, merged_hi);
> > + if (!iova) {
> > + spin_unlock_irqrestore(&iovad->iova_lock, flags);
> > + return NULL;
>
> A goto the same repeated block below might be in
> order?
>
Fixed for v4.
> > + }
> > +
> > + {
> > + MA_STATE(mas, &iovad->mtree, merged_lo,
> > merged_hi);
> >
> > - spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
>
> The maple state keeps track of where things are, so if you have a
> maple
> state pointing at the correct slot, then you can avoid rewalking the
> tree. This is rather complicated, but it will save you a hand full
> of
> dereferences if you need them.
This code is for the overlap case, where a new
iova mapping is explicitly set to overlap already
existing mappings.
This makes the maple tree logic even more complex,
and I'm not sure whether this case is common enough
to justify that.
>
> > @@ -956,8 +808,8 @@ int iova_cache_get(void)
> >
> > mutex_lock(&iova_cache_mutex);
> > if (!iova_cache_users) {
> > - iova_cache = kmem_cache_create("iommu_iova",
> > sizeof(struct iova), 0,
> > - SLAB_HWCACHE_ALIGN,
> > NULL);
> > + iova_cache = kmem_cache_create("iommu_iova",
> > sizeof(struct iova),
> > + 0, 0, NULL);
>
> Was this in the patch notes or expected to change?
This change is what reduces the allocated size of the
iova struct from 64 bytes to 16 bytes.
That seems desired, and potentially important for
some users.
I believe I also covered it in the changelog.
--
All Rights Reversed.