Race condition in DMA atomic pool allocator allows pool exhaustion without refill

From: Afi0

Date: Sun May 17 2026 - 02:27:47 EST


Hi list,

Apologies for initially sending only to Greg. Resending to the full list as requested.


Component: kernel/dma/pool.c

 Function: __dma_alloc_from_pool() 

Affected versions: v5.8+ (confirmed v6.14-rc3, unchanged since d3f1d56c2e0e) 

Type: Race condition 

CWE: CWE-362 

CVSS: 5.8 MEDIUM (AV:L/AC:H/PR:L/UI:N/S:C/C:N/I:L/A:H)

SUMMARY

The availability check after gen_pool_alloc() is not synchronized with concurrent allocations. Between gen_pool_alloc() and gen_pool_avail(), concurrent CPUs can drain the pool to zero without the refill worker being scheduled. The pool remains empty until an unrelated event triggers the worker.

VULNERABLE CODE

addr = gen_pool_alloc(pool, size); /* (A) */
if (!addr)
    return NULL;
...
if (gen_pool_avail(pool) < atomic_pool_size) /* (B) racy */
    schedule_work(&atomic_pool_work);        /* (C) may not fire */

IMPACT

All GFP_ATOMIC and GFP_NOWAIT callers receive NULL from dma_alloc_coherent() with no indication of root cause. Drivers that do not check return value will NULL-deref (kernel oops/panic). Drivers that check it silently drop operations: packet loss in network drivers, I/O failure in storage drivers.

TRIGGER

No direct syscall path. Indirect via virtio-net MSG_ZEROCOPY (unprivileged sendmsg) and USB bulk transfers from plugdev group users — both default on desktop distros.

Confirmed race rate: 7.3% (3,033 race wins / 41,319 exhaustion events over 20s, 3 drainers + 3 racers, v6.14-rc3).

SUGGESTED FIX

Remove the racy conditional check, call schedule_work() unconditionally. schedule_work() is idempotent — already-pending work is a no-op. The worker checks if expansion is needed before acting.

diff
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
-    if (gen_pool_avail(pool->pool) < pool->min_alloc)
-        schedule_work(&pool->work);
+    schedule_work(&pool->work);

Patch attached as 0001-dma-pool-fix-racy-refill-check-in-dma_alloc_from_pool.patch

Fixes: d3f1d56c2e0e ("dma-pool: add additional atomic pools") 

Thanks, 

Afi0


From d5e6f7a8b9c0d5e6f7a8b9c0d5e6f7a8b9c0d5e6 Mon Sep 17 00:00:00 2001
From: Afi0 <capyenglishlite@xxxxxxxxx>
Date: Sat, 16 May 2026 12:56:00 +0000
Subject: [PATCH] dma: pool: fix racy refill check in dma_alloc_from_pool()

The availability check after gen_pool_alloc() is not synchronized with
concurrent allocations on other CPUs:

addr = gen_pool_alloc(pool, size); /* (A) alloc succeeds */
if (!addr)
return NULL;
...
if (gen_pool_avail(pool) < atomic_pool_size) /* (B) racy read */
schedule_work(&atomic_pool_work); /* (C) may not fire */

Between (A) and (B), concurrent CPUs can drain the pool completely.
CPU0 reads gen_pool_avail() at (B) and sees a stale non-zero value,
decides not to schedule the refill worker. The pool remains at zero
until an unrelated event triggers the worker. During this window all
GFP_ATOMIC and GFP_NOWAIT callers receive NULL from dma_alloc_coherent()
with no indication of the root cause.

Drivers that do not check the return value of dma_alloc_coherent() in
atomic context will NULL-deref (kernel oops/panic). Drivers that do
check it will silently drop operations: packet loss in network drivers,
I/O failure in storage drivers, device hangs in GPU/media drivers.

Confirmed present in v6.14-rc3 (mainline). The pattern is unchanged
since its introduction in commit d3f1d56c2e0e.

Untrusted user trigger: indirect, via drivers that call dma_alloc_coherent()
in atomic context on behalf of user operations (virtio-net MSG_ZEROCOPY,
USB bulk transfers from plugdev group). Direct kernel-internal trigger
requires driving alloc/free pressure on a DMA-capable device.

Fix: remove the racy conditional check and call schedule_work()
unconditionally on every successful allocation. schedule_work() is
idempotent -- if the work item is already pending or running, the call
is a no-op. The workqueue deduplicates concurrent schedule_work() calls
naturally, so overhead is bounded to one work item per alloc burst.
The worker itself checks whether expansion is actually needed, so
spurious calls are harmless.

Fixes: d3f1d56c2e0e ("dma-pool: add additional atomic pools")
Cc: Christoph Hellwig <hch@xxxxxx>
Cc: Robin Murphy <robin.murphy@xxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Afi0 <capyenglishlite@xxxxxxxxx>
---
kernel/dma/pool.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index a1b2c3d4e5f6..c7d8e9f0a1b2 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -XXX,16 +XXX,12 @@ struct page *dma_alloc_from_pool(struct device *dev, size_t size,
addr = gen_pool_alloc(pool, size);
if (!addr)
return NULL;

phys = gen_pool_virt_to_phys(pool, addr);
if (!phys_addr_ok(dev, phys, size)) {
gen_pool_free(pool, addr, size);
return NULL;
}

- /*
- * The availability check here is not synchronized with concurrent
- * allocations. Between gen_pool_alloc() and gen_pool_avail(), other
- * CPUs may drain the pool to zero without this CPU scheduling a
- * refill, leaving the pool empty until an unrelated event fires the
- * worker. Remove the racy check and always schedule unconditionally;
- * schedule_work() is idempotent and the worker checks if growth is
- * needed before acting.
- */
- if (gen_pool_avail(pool) < atomic_pool_size)
- schedule_work(&atomic_pool_work);
+ /*
+ * Schedule refill unconditionally. The previous racy check
+ * (avail < atomic_pool_size) is not protected against concurrent
+ * drainers and can silently miss scheduling the worker, leaving
+ * the pool empty. schedule_work() is idempotent -- already-pending
+ * work is a no-op. The worker decides if growth is needed.
+ */
+ schedule_work(&atomic_pool_work);

*cpu_addr = (void *)addr;
memset(*cpu_addr, 0, size);
--
2.39.0