Re: [PATCH v2 11/11] mm, swap: merge zeromap into swap table

From: Kairui Song

Date: Sat Apr 18 2026 - 09:35:51 EST


On Sat, Apr 18, 2026 at 8:28 PM YoungJun Park <youngjun.park@xxxxxxx> wrote:
>
> On Fri, Apr 17, 2026 at 02:34:41AM +0800, Kairui Song via B4 Relay wrote:
>
> > *
> > * Usages:
> > *
> > @@ -74,17 +76,22 @@ struct swap_memcg_table {
> > #define SWP_TB_PFN_MARK_BITS 2
> > #define SWP_TB_PFN_MARK_MASK (BIT(SWP_TB_PFN_MARK_BITS) - 1)
> >
> > -/* SWAP_COUNT part for PFN or shadow, the width can be shrunk or extended */
> > -#define SWP_TB_COUNT_BITS min(4, BITS_PER_LONG - SWP_TB_PFN_BITS)
> > +/* SWAP_COUNT and flags for PFN or shadow, width can be shrunk or extended */
> > +#define SWP_TB_FLAGS_BITS min(5, BITS_PER_LONG - SWP_TB_PFN_BITS)
> > +#define SWP_TB_COUNT_BITS (SWP_TB_FLAGS_BITS - 1)
>
> Hi Kairui :)
>
> Would this break the build on 32-bit arches with 40-bit phys
> addrs (MAX_POSSIBLE_PHYSMEM_BITS = 40)?
>
> Architectures I checked.
> - ARM LPAE (CONFIG_ARM_LPAE=y)
> - ARC PAE40 (CONFIG_ARC_HAS_PAE40=y)
> - MIPS XPA (CONFIG_XPA=y)
>
> Calculations.
>
> SWP_TB_PFN_BITS = 28 + 2 = 30
> SWP_TB_FLAGS_BITS = min(5, 32 - 30) = 2
> SWP_TB_COUNT_BITS = 2 - 1 = 1
>
> The BUILD_BUG_ON looks like the real problem. it needs at
> least 3 count values (free/used/overflow).
>
> BUILD_BUG_ON(SWP_TB_COUNT_MAX < 2 || SWP_TB_COUNT_BITS < 2);
>
> Confirmed with a cross build (multi_v7_defconfig + lpae.config).
>
> error: BUILD_BUG_ON failed: SWP_TB_COUNT_MAX < 2 || SWP_TB_COUNT_BITS < 2
> at __count_to_swp_tb (mm/swap_table.h:227)

Hi YoungJun

Nice catch! Thanks a lot :)

> I think the right fix is widening swap_tb to 64 bits
> unconditionally (atomic64_t).

I'm a bit concerned that memory usage on 32 bits will bloat up...

>
> (Or, uglier, these arches could always route counts through the
> extend table.)
>

Seems not ugly with a ci->zero_bitmap, looks clean to me, the
definition will be:

SWP_TABLE_USE_INLINE_ZEROMAP is true when BITS_PER_LONG is not enough
for SWP_TB_FLAGS_BITS, then:

struct swap_cluster_info {
...
#ifndef SWP_TABLE_USE_INLINE_ZEROMAP
unsigned long *zero_bitmap;
#endif
...
};

And helpers will be:
static inline void __swap_table_set_zero(struct swap_cluster_info *ci,
unsigned int ci_off)
{
unsigned long swp_tb;

#ifdef SWP_TABLE_USE_INLINE_ZEROMAP
return bitmap_set(&ci->zeromap);
#else

swp_tb = __swap_table_get(ci, ci_off);
VM_WARN_ON(!swp_tb_is_countable(swp_tb));
swp_tb |= SWP_TB_ZERO_MARK;
__swap_table_set(ci, ci_off, swp_tb);
}

There are only three helpers in total, looks fine. Allocation part is
just like the memcg_table. Compared to this version only it seems
onlys needs a few dozen lines change (A few #ifdef
SWP_TABLE_USE_INLINE_ZEROMAP) and not hard to understand. How do you
think?