Re: [PATCH v2 09/12] mm, swap: use the swap table to track the swap count

From: Kairui Song

Date: Sun Feb 01 2026 - 22:28:24 EST


On Thu, Jan 29, 2026 at 4:28 PM YoungJun Park <youngjun.park@xxxxxxx> wrote:
>
> On Wed, Jan 28, 2026 at 05:28:33PM +0800, Kairui Song wrote:
> > From: Kairui Song <kasong@xxxxxxxxxxx>
>
> > index bfafa637c458..751430e2d2a5 100644
> > --- a/mm/swap.h
> > +++ b/mm/swap.h
> > @@ -37,6 +37,7 @@ struct swap_cluster_info {
> > u8 flags;
> > u8 order;
> > atomic_long_t __rcu *table; /* Swap table entries, see mm/swap_table.h */
> > + unsigned long *extend_table; /* For large swap count, protected by ci->lock */
>
> I assume using 'int *' is to save memory on 64-bit architectures (8 bytes ->
> 4 bytes per entry), which aligns with swp_tb_get_count() returning an int.

Right I used long as I'm not very sure if we will ever have a counter
larger than int, but folio's refs are int already, so using int*
should be enough here. Thanks for the suggestion!

>
> Regarding the extended reference table.
> While I agree that a simple array is better for speed, readability and so on, the
> 2KB overhead (assuming SWAPFILE_CLUSTER=256) might be significant in
> constrained environments when only a few entries overflow SWP_TB_COUNT_MAX.

Indeed, but note before this change, we also have a 4K overhead if
only one or few entries overflow CONT_MAX in a given range. That 4K
covers a larger range though. And entries with very large counts seem
very rare in practice.

>
> Have you considered using a resizable hash table(example. or something others)
> instead? I am curious if this approach could be applicable
> as a future optimization after the current code is merged.

Yeah I do have several ideas about how to optimize it :)

Currently using a single plain extended table simplifies things a lot,
and I tested some common workloads with SWP_TB_COUNT_MAX == 2, the
memory consumption and performance overhead is looking good.

A later idea is that we might be able to move the swap count into
folio struct for cached folios, and remove anon shadow completely to
only store the count for swapped out entry. That way we'll always have
zero overhead even if the swap count is super large. That requires
some tweaks for the LRU side.

Or if that's not doable we can use other ideas like you suggested.