Re: [PATCH v2 11/11] mm, swap: merge zeromap into swap table

From: YoungJun Park

Date: Sun Apr 19 2026 - 08:51:14 EST

On Sat, Apr 18, 2026 at 09:34:35PM +0800, Kairui Song wrote:
> On Sat, Apr 18, 2026 at 8:28 PM YoungJun Park <youngjun.park@xxxxxxx> wrote:
> >
> > On Fri, Apr 17, 2026 at 02:34:41AM +0800, Kairui Song via B4 Relay wrote:
> >
> > > *
> > > * Usages:
> > > *
> > > @@ -74,17 +76,22 @@ struct swap_memcg_table {
> > > #define SWP_TB_PFN_MARK_BITS 2
> > > #define SWP_TB_PFN_MARK_MASK (BIT(SWP_TB_PFN_MARK_BITS) - 1)
> > >
> > > -/* SWAP_COUNT part for PFN or shadow, the width can be shrunk or extended */
> > > -#define SWP_TB_COUNT_BITS min(4, BITS_PER_LONG - SWP_TB_PFN_BITS)
> > > +/* SWAP_COUNT and flags for PFN or shadow, width can be shrunk or extended */
> > > +#define SWP_TB_FLAGS_BITS min(5, BITS_PER_LONG - SWP_TB_PFN_BITS)
> > > +#define SWP_TB_COUNT_BITS (SWP_TB_FLAGS_BITS - 1)
> >
> > Hi Kairui :)
> >
> > Would this break the build on 32-bit arches with 40-bit phys
> > addrs (MAX_POSSIBLE_PHYSMEM_BITS = 40)?
> >
> > Architectures I checked.
> > - ARM LPAE (CONFIG_ARM_LPAE=y)
> > - ARC PAE40 (CONFIG_ARC_HAS_PAE40=y)
> > - MIPS XPA (CONFIG_XPA=y)
> >
> > Calculations.
> >
> > SWP_TB_PFN_BITS = 28 + 2 = 30
> > SWP_TB_FLAGS_BITS = min(5, 32 - 30) = 2
> > SWP_TB_COUNT_BITS = 2 - 1 = 1
> >
> > The BUILD_BUG_ON looks like the real problem. it needs at
> > least 3 count values (free/used/overflow).
> >
> > BUILD_BUG_ON(SWP_TB_COUNT_MAX < 2 || SWP_TB_COUNT_BITS < 2);
> >
> > Confirmed with a cross build (multi_v7_defconfig + lpae.config).
> >
> > error: BUILD_BUG_ON failed: SWP_TB_COUNT_MAX < 2 || SWP_TB_COUNT_BITS < 2
> > at __count_to_swp_tb (mm/swap_table.h:227)
>
> Hi YoungJun
>
> Nice catch! Thanks a lot :)
>
> > I think the right fix is widening swap_tb to 64 bits
> > unconditionally (atomic64_t).
>
> I'm a bit concerned that memory usage on 32 bits will bloat up...
>
> >
> > (Or, uglier, these arches could always route counts through the
> > extend table.)
> >
>
> Seems not ugly with a ci->zero_bitmap, looks clean to me, the
> definition will be:
>
> SWP_TABLE_USE_INLINE_ZEROMAP is true when BITS_PER_LONG is not enough
> for SWP_TB_FLAGS_BITS, then:
>
> struct swap_cluster_info {
> ...
> #ifndef SWP_TABLE_USE_INLINE_ZEROMAP
> unsigned long *zero_bitmap;
> #endif
> ...
> };
>
> And helpers will be:
> static inline void __swap_table_set_zero(struct swap_cluster_info *ci,
> unsigned int ci_off)
> {
> unsigned long swp_tb;
>
> #ifdef SWP_TABLE_USE_INLINE_ZEROMAP
> return bitmap_set(&ci->zeromap);
> #else
>
> swp_tb = __swap_table_get(ci, ci_off);
> VM_WARN_ON(!swp_tb_is_countable(swp_tb));
> swp_tb |= SWP_TB_ZERO_MARK;
> __swap_table_set(ci, ci_off, swp_tb);
> }
>
> There are only three helpers in total, looks fine. Allocation part is
> just like the memcg_table. Compared to this version only it seems
> onlys needs a few dozen lines change (A few #ifdef
> SWP_TABLE_USE_INLINE_ZEROMAP) and not hard to understand. How do you
> think?

Hi Kairui,

Sounds good. easy to understand and not many changes.

Another option could be using a 64-bit entry only on LPAE-like arches,
(not including every 32-bit arch)
though that would mean adding a separate set of atomic64 ops.

The direction you proposed seems cleaner, so I'm on board.

Thanks,
YoungJun Park