Re: [RFC PATCH v2 0/4] mm/zsmalloc: reduce zs_free() latency on swap release path
From: Xueyuan Chen
Date: Tue Apr 21 2026 - 20:35:45 EST
On Tue, Apr 21, 2026 at 11:25:17AM -0700, Nhat Pham wrote:
[...]
>Hmm, free_zspage() and kmem_cache_free().
>
>* kmem_cache_free() is just handle freeing. Bulk-freeing?
>
>* free_zspage() looks like just ordinary teardown work :( Seems like
>we're not spinning any lock here - we just try lock the backing pages,
>and the rest is normal work. Not sure how to optimize this - perhaps
>deferring is the only way.
>
>
Hi Nhat,
Currently, free_zspage() is called while holding the class->lock.
However, free_zspage() eventually invokes folio_put(), which may acquire
the zone->lock.
This creates a nested lock dependency. If multiple CPUs contend for the
same class->lock and the current holder is stalled waiting for the
zone->lock, it significantly extends the hold time of the class->lock.
This causes other CPUs to wait much longer.
Here is the ftrace data showing the severe contention on class->lock.
Under contention, the time spent in queued_spin_lock_slowpath() jumps
from ~1.3us to over 30us, significantly increasing the total latency
of zs_free().
7) | zs_free() {
7) 0.220 us | _raw_read_lock();
7) | _raw_spin_lock() {
7) 1.320 us | queued_spin_lock_slowpath();
7) 1.820 us | }
7) 0.170 us | _raw_read_unlock();
7) 0.170 us | obj_free();
7) 0.190 us | fix_fullness_group();
7) 0.150 us | _raw_spin_unlock();
7) 0.170 us | kmem_cache_free();
7) 4.610 us | }
---------------------------------------------------------
7) | zs_free() {
7) 0.230 us | _raw_read_lock();
7) | _raw_spin_lock() {
7) + 30.100 us | queued_spin_lock_slowpath();
7) + 30.600 us | }
7) 0.200 us | _raw_read_unlock();
7) 0.170 us | obj_free();
7) 0.170 us | fix_fullness_group();
7) 0.170 us | _raw_spin_unlock();
7) 0.210 us | kmem_cache_free();
7) + 33.850 us | }
Best regards,
Xueyuan