Re: [PATCH 3/3] iova: defer maple tree erase on GFP_ATOMIC failure
From: Ashok Raj
Date: Thu Jun 25 2026 - 11:53:54 EST
On Tue, Jun 23, 2026 at 11:07:36PM -0400, Rik van Riel wrote:
Hi Rik,
Thanks for the v4 redesign — the in-place marker approach is much cleaner than
the delayed_work/llist scheme from v3, and it directly addresses the
retry-forever concern I raised earlier.
[snip]
> +/*
> + * Remove an IOVA entry from the maple tree and free it.
> + *
> + * This runs in atomic context (DMA map/unmap can be called from hardirq,
> + * softirq, or with spinlocks held) and must not fail. Erasing an entry can
> + * require a maple tree node for rebalancing, and mas_store_gfp(NULL,
> + * GFP_ATOMIC) can fail under memory pressure. When it does, overwrite the slot
> + * in place with IOVA_DEFERRED -- an in-place store needs no node allocation and
> + * so cannot fail -- which keeps the address range reserved (the allocator's gap
> + * search treats the non-NULL slot as occupied) and lets the struct iova be
> + * freed now. The marker is erased later by iova_drain_deferred().
> + */
> static void remove_iova(struct iova_domain *iovad, struct iova *iova)
> {
> - MA_STATE(mas, &iovad->mtree, iova->pfn_lo, iova->pfn_hi);
> + unsigned long pfn_lo = iova->pfn_lo, pfn_hi = iova->pfn_hi;
> +
> + MA_STATE(mas, &iovad->mtree, pfn_lo, pfn_hi);
>
> assert_spin_locked(&iovad->iova_lock);
>
> - if (iova->pfn_lo < iovad->dma_32bit_pfn)
> + if (pfn_lo < iovad->dma_32bit_pfn)
> iovad->max32_alloc_size = iovad->dma_32bit_pfn;
>
> - mas_store_gfp(&mas, NULL, GFP_ATOMIC);
> + if (iova_kunit_defer_erase || mas_store_gfp(&mas, NULL, GFP_ATOMIC)) {
> + /* Erase failed: mark the slot in place and defer removal. */
> + mas_set_range(&mas, pfn_lo, pfn_hi);
> + if (WARN_ON_ONCE(mas_store_gfp(&mas, IOVA_DEFERRED, GFP_ATOMIC)))
> + return; /* in-place store cannot fail; entry stays put */
/*
* <-- deferred_lo/hi not updated and range stays
* occupied for ever?
*/
> + if (pfn_lo < iovad->deferred_lo)
> + iovad->deferred_lo = pfn_lo;
> + if (pfn_hi > iovad->deferred_hi)
> + iovad->deferred_hi = pfn_hi;
> + free_iova_mem(iova);
<--- only reached if second store succeeds?
> + return;
> + }
> +
> + free_iova_mem(iova);
> +
> + /* A successful erase means memory is available; clear any backlog. */
> + if (unlikely(iova_has_deferred(iovad)))
> + iova_drain_deferred(iovad);
> }
>
If the in-place marker store fails and WARN_ON_ONCE fires:
- free_iova_mem(iova) is never reached — the struct iova leaks.
- deferred_lo/hi is not updated — the address range stays permanently
occupied with no way to reclaim it.
- After the first occurrence, WARN_ON_ONCE suppresses the warning on all
subsequent hits, so further failures on this path are completely silent.
I understand the comment says this cannot happen, and if that guarantee holds
then it's fine. But if it can, a BUG_ON might be a cleaner way to express
that invariant — it makes the intent unambiguous rather than leaving the
impression the failure is being gracefully handled when it isn't.
Happy to be wrong if I'm missing something about the maple tree's in-place store
guarantees.
Cheers,
Ashok