Re: [PATCH v5 1/2] mm: store zero pages to be swapped out in a bitmap

From: Yosry Ahmed
Date: Fri Jun 14 2024 - 14:42:07 EST


On Fri, Jun 14, 2024 at 5:06 AM Chengming Zhou <chengming.zhou@xxxxxxxxx> wrote:
>
> On 2024/6/14 18:07, Usama Arif wrote:
> > Approximately 10-20% of pages to be swapped out are zero pages [1].
> > Rather than reading/writing these pages to flash resulting
> > in increased I/O and flash wear, a bitmap can be used to mark these
> > pages as zero at write time, and the pages can be filled at
> > read time if the bit corresponding to the page is set.
> > With this patch, NVMe writes in Meta server fleet decreased
> > by almost 10% with conventional swap setup (zswap disabled).
> >
> > [1] https://lore.kernel.org/all/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1/
> >
> > Signed-off-by: Usama Arif <usamaarif642@xxxxxxxxx>
>
> Looks good to me, only some small nits below.
>
> Reviewed-by: Chengming Zhou <chengming.zhou@xxxxxxxxx>
>
> > ---
> > include/linux/swap.h | 1 +
> > mm/page_io.c | 113 ++++++++++++++++++++++++++++++++++++++++++-
> > mm/swapfile.c | 15 ++++++
> > 3 files changed, 128 insertions(+), 1 deletion(-)
> >
> [...]
> > +
> > +static void swap_zeromap_folio_set(struct folio *folio)
> > +{
> > + struct swap_info_struct *sis = swp_swap_info(folio->swap);
> > + swp_entry_t entry;
> > + unsigned int i;
> > +
> > + for (i = 0; i < folio_nr_pages(folio); i++) {
> > + entry = page_swap_entry(folio_page(folio, i));
>
> It seems simpler to use:
>
> swp_entry_t entry = folio->swap;
>
> for (i = 0; i < folio_nr_pages(folio); i++, entry.val++)

I was actually thinking we could introduce folio_swap_entry(folio, i)
after the series. Multiple callers of page_swap_entry() have a folio
already. It would save some compound_head() calls.

Alternatively, for this patch we can introduce
zeromap_update_range(zeromap, offset, size, value). Then we can use it
in swap_zeromap_folio_set/cear() as well as swap_range_free(). It
would also be a good place to park the comment about using atomic
operations (set_bit() and clear_bit()).