Re: [RFC PATCH v2 0/4] mm/zsmalloc: reduce zs_free() latency on swap release path
From: Xueyuan Chen
Date: Sun Apr 26 2026 - 04:50:43 EST
On Sun, Apr 26, 2026 at 12:13:02PM +0800, Wenchao Hao wrote:
[...]
>2. Per-cpu deferred free with lockless buffer swap
>
>Defer zs_free() to per-cpu dynamically-allocated buffers (~2048 entries).
>Enqueue: one array write + WRITE_ONCE under preempt_disable — no lock,
>no atomic. When buffers full, schedule a drain worker; overflow falls back
>to sync zs_free().
>
>Drain: allocate a fresh buffer, swap it in, reset count. Since
>the producer stops writing at count==SIZE, the handoff is
>race-free without any lock.
>
>Pseudo-code:
>
> /* enqueue - hot path */
> def = get_cpu_ptr(pool->deferred);
> if (def->count < SIZE) {
> def->handles[def->count] = handle;
> WRITE_ONCE(def->count, def->count + 1);
> if (def->count == SIZE)
> schedule_work(&pool->drain_work);
> } else {
> zs_free(pool, handle); /* fallback */
> }
> put_cpu_ptr(pool->deferred);
>
> /* drain - worker */
> for_each_possible_cpu(cpu) {
> def = per_cpu_ptr(pool->deferred, cpu);
> if (def->count < SIZE)
> continue;
> new_buf = kvmalloc_array(SIZE, sizeof(long));
> old_buf = def->handles;
> old_count = def->count;
> def->handles = new_buf;
> WRITE_ONCE(def->count, 0);
> /* now drain old_buf[0..old_count-1] */
> ...
> kvfree(old_buf);
> }
>
Hi Wenchao,
I suspect there is a memory ordering issue here:
def->handles = new_buf;
WRITE_ONCE(def->count, 0);
Since there are no explicit memory barriers, we cannot guarantee the
order of these stores. If def->count is cleared to 0 first, an enqueue
might end up operating on the old_buf.
This race condition is more likely to be triggered when the size is
smaller. Perhaps we should consider using smp_store_release() to enforce
the ordering?
Thanks
Xueyuan