Re: [PATCH 3/3] iova: defer maple tree erase on GFP_ATOMIC failure
From: Rik van Riel
Date: Fri Jun 26 2026 - 14:33:25 EST
On Thu, 2026-06-25 at 08:51 -0700, Ashok Raj wrote:
> On Tue, Jun 23, 2026 at 11:07:36PM -0400, Rik van Riel wrote:
>
> Hi Rik,
>
> Thanks for the v4 redesign — the in-place marker approach is much
> cleaner than
> the delayed_work/llist scheme from v3, and it directly addresses
> the
> retry-forever concern I raised earlier.
>
> [snip]
>
> > +/*
> > + * Remove an IOVA entry from the maple tree and free it.
> > + *
> > + * This runs in atomic context (DMA map/unmap can be called from
> > hardirq,
> > + * softirq, or with spinlocks held) and must not fail. Erasing an
> > entry can
> > + * require a maple tree node for rebalancing, and
> > mas_store_gfp(NULL,
> > + * GFP_ATOMIC) can fail under memory pressure. When it does,
> > overwrite the slot
> > + * in place with IOVA_DEFERRED -- an in-place store needs no node
> > allocation and
> > + * so cannot fail -- which keeps the address range reserved (the
> > allocator's gap
> > + * search treats the non-NULL slot as occupied) and lets the
> > struct iova be
> > + * freed now. The marker is erased later by iova_drain_deferred().
> > + */
> > static void remove_iova(struct iova_domain *iovad, struct iova
> > *iova)
> > {
> > - MA_STATE(mas, &iovad->mtree, iova->pfn_lo, iova->pfn_hi);
> > + unsigned long pfn_lo = iova->pfn_lo, pfn_hi = iova-
> > >pfn_hi;
> > +
> > + MA_STATE(mas, &iovad->mtree, pfn_lo, pfn_hi);
> >
> > assert_spin_locked(&iovad->iova_lock);
> >
> > - if (iova->pfn_lo < iovad->dma_32bit_pfn)
> > + if (pfn_lo < iovad->dma_32bit_pfn)
> > iovad->max32_alloc_size = iovad->dma_32bit_pfn;
> >
> > - mas_store_gfp(&mas, NULL, GFP_ATOMIC);
> > + if (iova_kunit_defer_erase || mas_store_gfp(&mas, NULL,
> > GFP_ATOMIC)) {
> > + /* Erase failed: mark the slot in place and defer
> > removal. */
> > + mas_set_range(&mas, pfn_lo, pfn_hi);
> > + if (WARN_ON_ONCE(mas_store_gfp(&mas,
> > IOVA_DEFERRED, GFP_ATOMIC)))
> > + return; /* in-place store cannot
> > fail; entry stays put */
> /*
> * <-- deferred_lo/hi not updated and range stays
> * occupied for ever?
> */
If the in-place store fails, then updating
deferred_lo / hi won't help, since the
sweep later on will be unable to identify
this area.
Freeing the iova at that point would leave
the maple tree with a pointer into memory
that could be re-used for something else.
We need to either return early and warn,
or panic.
I'm fine with either.
Robin, do you have a preference here?
> > + if (pfn_lo < iovad->deferred_lo)
> > + iovad->deferred_lo = pfn_lo;
> > + if (pfn_hi > iovad->deferred_hi)
> > + iovad->deferred_hi = pfn_hi;
> > + free_iova_mem(iova);
> <--- only reached if second store succeeds?
> > + return;
> > + }
> >
--
All Rights Reversed.