Re: [PATCH v3 12/12] mm, swap: merge zeromap into swap table

From: Kairui Song

Date: Wed May 13 2026 - 13:53:04 EST

On Tue, May 12, 2026 at 1:04 AM Chris Li <chrisl@xxxxxxxxxx> wrote:
>
> On Tue, Apr 21, 2026 at 8:16 AM Kairui Song via B4 Relay
> <devnull+kasong.tencent.com@xxxxxxxxxx> wrote:
> >
> > From: Kairui Song <kasong@xxxxxxxxxxx>
> >
> > By allocating one additional bit in the swap table entry's flags field
> > alongside the count, we can store the zeromap inline
> >
> > For certain 32-bit archs, there might not be enough bits in the swap
> > table to contain both PFN and flags. Therefore, conditionally let each
> > cluster have a zeromap field at build time, and use that instead of the
> > swap table for these archs. A few macros were moved to different headers
> > for build time struct definition.
>
> It might be worthwhile to mention the user-visible impact. For 64 bit
> systems. The zeromap will store in the swap table, avoiding zeromap
> allocation. It reduces the allocated memory. That is the happy path.
> For certain 32-bit architectures, if the swapfile cluster is not fully
> used, it will use less memory for zeromap. The empty cluster does not
> allocate a zeromap. We still save memory. In the worst case, all
> cluster are fully populated. We will use memory similar to the
> previous zeromap implementation.

Will add this to commit message.

> > +/*
> > + * Return the count of contiguous swap entries that share the same
> > + * zeromap status as the starting entry. If is_zerop is not NULL,
> > + * it will return the zeromap status of the starting entry.
> > + *
> > + * Context: Caller must ensure the cluster containing the entries
> > + * that will be checked won't be freed.
> > + */
> > +static int swap_zeromap_batch(swp_entry_t entry, int max_nr,
> > + bool *is_zerop)
> > +{
> > + bool is_zero;
> > + struct swap_cluster_info *ci = __swap_entry_to_cluster(entry);
> > + unsigned int ci_start = swp_cluster_offset(entry), ci_off, ci_end;
> > +
> > + ci_off = ci_start;
> > + ci_end = ci_off + max_nr;
>
> Should we check ci_end less than the cluster's end and complain if not?

Currently, we only call this function for folio->swap with
folio_nr_pages as the max_nr, so this scenario should never occur, but
a sanity check is always good to have. It might be even better to
convert this whole function to work on a folio basis later, since a
locked swap cache folio has much better restraint on its swap entries.
This can be done later as a clean up.

>
> It seems using a for loop can be simpler. The loop index serves as a
> counter as well.
> Totally untested code:
>
> int i;
> rcu_read_lock();
> is_zero = __swap_table_test_zero(ci, ci_start);
> for (i =1; i < max_nr ; i++)
> if (is_zero != __swap_table_test_zero(ci, ci_start + i))
> break;
> rcu_read_unlock();
> if (is_zerop)
> *is_zerop = is_zero;
> return i;
>

Thanks! Looks good, it's more compact indeed.