Re: [RFC PATCH v2 2/4] mm/zsmalloc: introduce zs_free_deferred() for async handle freeing

From: Nhat Pham

Date: Thu Apr 23 2026 - 12:54:17 EST


On Tue, Apr 21, 2026 at 2:42 PM Barry Song <baohua@xxxxxxxxxx> wrote:
> On Wed, Apr 22, 2026 at 3:47 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
> >
> > On Tue, Apr 21, 2026 at 5:16 AM Wenchao Hao <haowenchao22@xxxxxxxxx> wrote:
> > >
> > > zs_free() is expensive due to internal locking (pool->lock, class->lock)
> > > and potential zspage freeing. On the process exit path, the slow
> > > zs_free() blocks memory reclamation, delaying overall memory release.
> > > This has been reported to significantly impact Android low-memory
> > > killing where slot_free() accounts for over 80% of the total swap
> > > entry freeing cost.
> > >
> > > Introduce zs_free_deferred() which queues handles into a fixed-size
> > > per-pool array for later processing by a workqueue. This allows callers
> > > to defer the expensive zs_free() and return quickly, so the process
> > > exit path can release memory faster. The array capacity is derived from
> > > a 128MB uncompressed data budget (128MB >> PAGE_SHIFT entries), which
> > > scales naturally with PAGE_SIZE. When the array reaches half capacity,
> > > the workqueue is scheduled to drain pending handles.
> > >
> > > zs_free_deferred() uses spin_trylock() to access the deferred queue.
> > > If the lock is contended (e.g. drain in progress) or the queue is full,
> > > it falls back to synchronous zs_free() to guarantee correctness.
> > >
> > > Also introduce zs_free_deferred_flush() for use during pool teardown to
> > > ensure all pending handles are freed.
> >
> > Hmmm per-pool workqueue.
> >
> > Does that mean that if you only have one zs pool (in the case of
> > zswap, or if you only have one zram device), you'll have less
> > concurrency in freeing up zsmalloc memory for process teardown? Would
> > this be problematic?
>
> I believe so, as reported in the original email from Lei and Zhiguo,
> which proposed introducing a swap entries list for async free.
>
> >
> > I think Kairui was also suggesting per-cpu-fying these batches/queues.
>
> I guess a per–size-class workqueue might strike a balance
> between scalability and reducing lock contention across
> multiple classes, where the locks actually reside.

Sounds good! Let the numbers decide :)

>
> Thanks
> Barry