Re: [PATCH v6 0/4] mm/zsmalloc: reduce lock contention in zs_free()

From: Andrew Morton

Date: Sun Jun 28 2026 - 00:36:12 EST


On Fri, 26 Jun 2026 09:49:59 +0800 Wenchao Hao <haowenchao22@xxxxxxxxx> wrote:

> From: Wenchao Hao <haowenchao@xxxxxxxxxx>
>
> This series reduces lock contention in zs_free(), which dominates the
> unmap path under memory pressure on Android (LMK kills) and on x86
> servers running zswap-heavy workloads.
>
> The current zs_free() takes pool->lock (rwlock, read side) just to
> look up the size_class for a handle, then takes class->lock and holds
> it across __free_zspage() which can call into the buddy allocator and
> acquire zone->lock. Two costs follow:
>
> * pool->lock reader-counter cacheline bouncing among concurrent
> zs_free() callers.
> * class->lock held across folio_put(), so any zone->lock wait
> fans out to every other zs_free() on the same class.
>
> The series tackles both:
>
> Patch 1: encode size_class index into obj alongside PFN and obj_idx,
> so zs_free() can locate the class without pool->lock.
> Patch 2: drop pool->lock from zs_free() on 64-bit; 32-bit unchanged.
> Patch 3: move zspage page-freeing out of class->lock.
> Patch 4: document the three free_zspage helper variants that result
> from the split in patch 3.
>
> Performance results:
>
> Test: each process independently mmap 256MB, write data, madvise
> MADV_PAGEOUT to swap out via zram (lzo-rle), then concurrent munmap.
>
> Raspberry Pi 4B (4-core ARM64 Cortex-A72):
>
> mode Base Patched Speedup
> single 59.0ms 56.0ms 1.05x
> multi 2p 94.6ms 66.7ms 1.42x
> multi 4p 202.9ms 110.6ms 1.83x
>
> x86 (20-core Intel i7-12700, 16 concurrent processes):
>
> mode Base Patched Speedup
> single 11.7ms 9.8ms 1.19x
> multi 2p 24.1ms 17.2ms 1.40x
> multi 4p 63.0ms 45.3ms 1.39x

Well that's a nice result.

Sashiko AI review said .... nothing. I don't recall seeing that
before ;)

I'll add this series to mm.git for the next step, thanks.