Re: Linux: DMA-after-unmap race in ZCRX via netif_rxq_cleanup_unlease() ordering inversion (netkit + page_pool)
From: Daniel Borkmann
Date: Thu May 28 2026 - 18:31:38 EST
Hi Ahmed,
On 5/28/26 1:33 AM, Jakub Kicinski wrote:
Dropping security lists, security lists are for private discussions,
it's utterly pointless to CC both them and LKML. Not to mention
that this bug only exists in -rc kernels.
Adding relevant developers. Moving security@ to Bcc
Thanks for the report! I think a fix could look as below. Before submitting,
I would prefer though if David could check this against real HW supporting
mem providers e.g. BCM NIC:
diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
index de4dac4c88b3..00a7011eb4d5 100644
--- a/net/core/netdev_rx_queue.c
+++ b/net/core/netdev_rx_queue.c
@@ -338,12 +338,12 @@ void __netif_mp_uninstall_rxq(struct netdev_rx_queue *rxq,
void netif_rxq_cleanup_unlease(struct netdev_rx_queue *phys_rxq,
struct netdev_rx_queue *virt_rxq)
{
- struct pp_memory_provider_params *p = &phys_rxq->mp_params;
unsigned int rxq_idx = get_netdev_rx_queue_index(phys_rxq);
+ struct pp_memory_provider_params p = phys_rxq->mp_params;
- if (!p->mp_ops)
+ if (!p.mp_ops)
return;
- __netif_mp_uninstall_rxq(virt_rxq, p);
- __netif_mp_close_rxq(phys_rxq->dev, rxq_idx, p);
+ __netif_mp_close_rxq(phys_rxq->dev, rxq_idx, &p);
+ __netif_mp_uninstall_rxq(virt_rxq, &p);
}
On Wed, 27 May 2026 23:53:45 +0100 Prénom? Ahmed wrote:
Hello,
I would like to report a source-proven teardown ordering bug in the Linux
kernel that can lead to a DMA-after-unmap race condition involving ZCRX
(io_uring zero-copy receive), page_pool, and netkit queue leasing.
***Reporter:** Ahmed Abdelmoemen **Discovery Date:** 2026-05-26 **Kernel
Version:** Linux 7.1.0-rc3*
Executive Summary
*A logic error in `netif_rxq_cleanup_unlease()` causes DMA mappings for the
ZCRX memory provider to be revoked **before** the physical NIC RX queue is
stopped. This creates a race window during netkit queue lease teardown
where the physical device's NAPI can consume stale `net_iov` entries from
the page_pool alloc cache containing `dma_addr = 0`.*
The ordering inversion is fully proven at the source level. However, I have
**not** performed runtime verification, so actual memory corruption or
successful DMA to address 0 has **not** been proven — it remains hardware
and driver dependent.
The bug is reachable with `CAP_NET_ADMIN` (common in container
environments) when using netkit with ZCRX.
Root Cause
In `net/core/netdev_rx_queue.c:347-348`:
```c __netif_mp_uninstall_rxq(virt_rxq, p); // DMA unmap + dma_addr=0
__netif_mp_close_rxq(...); // queue stop + NAPI disable (TOO LATE)
This inverts the correct ordering used in normal device unregistration and
io_uring close paths (stop first, then unmap).
Impact
- *Potential:* NIC DMA write to physical address 0 (or stale mappings
with lazy IOMMU) leading to memory corruption.
- *Requirements:* CAP_NET_ADMIN + netkit queue leasing + ZCRX installed
on the leased queue.
- *Current Status:* No runtime PoC or crash reproduction yet. The race
window exists in theory but its practical exploitability needs confirmation.
I am attaching the full detailed analysis.
Proposed Fix[image: image.png]
I am happy to provide more details or assist with testing.