RE: [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to process multiple pages in a folio.

From: Sridhar, Kanchana P
Date: Mon Dec 02 2024 - 20:13:16 EST



> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
> Sent: Monday, December 2, 2024 11:34 AM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> hannes@xxxxxxxxxxx; nphamcs@xxxxxxxxx; chengming.zhou@xxxxxxxxx;
> usamaarif642@xxxxxxxxx; ryan.roberts@xxxxxxx; 21cnbao@xxxxxxxxx;
> akpm@xxxxxxxxxxxxxxxxxxxx; Feghali, Wajdi K <wajdi.k.feghali@xxxxxxxxx>;
> Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>
> Subject: Re: [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to
> process multiple pages in a folio.
>
> On Wed, Nov 27, 2024 at 2:53 PM Kanchana P Sridhar
> <kanchana.p.sridhar@xxxxxxxxx> wrote:
> >
> > Modified zswap_store() to store the folio in batches of
> > SWAP_CRYPTO_BATCH_SIZE pages. Accordingly, refactored
> zswap_store_page()
> > into zswap_store_pages() that processes a range of pages in the folio.
> > zswap_store_pages() is a vectorized version of zswap_store_page().
> >
> > For now, zswap_store_pages() will sequentially compress these pages with
> > zswap_compress().
> >
> > These changes are follow-up to code review comments received for [1], and
> > are intended to set up zswap_store() for batching with Intel IAA.
> >
> > [1]: https://patchwork.kernel.org/project/linux-
> mm/patch/20241123070127.332773-11-kanchana.p.sridhar@xxxxxxxxx/
> >
> > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@xxxxxxxxx>
> > ---
> > include/linux/zswap.h | 1 +
> > mm/zswap.c | 154 ++++++++++++++++++++++++------------------
> > 2 files changed, 88 insertions(+), 67 deletions(-)
> >
> > diff --git a/include/linux/zswap.h b/include/linux/zswap.h
> > index d961ead91bf1..05a81e750744 100644
> > --- a/include/linux/zswap.h
> > +++ b/include/linux/zswap.h
> > @@ -7,6 +7,7 @@
> >
> > struct lruvec;
> >
> > +#define SWAP_CRYPTO_BATCH_SIZE 8UL
> > extern atomic_long_t zswap_stored_pages;
> >
> > #ifdef CONFIG_ZSWAP
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index f6316b66fb23..b09d1023e775 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -1409,78 +1409,96 @@ static void shrink_worker(struct work_struct
> *w)
> > * main API
> > **********************************/
> >
> > -static ssize_t zswap_store_page(struct page *page,
> > - struct obj_cgroup *objcg,
> > - struct zswap_pool *pool)
> > +/*
> > + * Store multiple pages in @folio, starting from the page at index @si up to
> > + * and including the page at index @ei.
> > + */
> > +static ssize_t zswap_store_pages(struct folio *folio,
> > + long si,
> > + long ei,
> > + struct obj_cgroup *objcg,
> > + struct zswap_pool *pool)
> > {
> > - swp_entry_t page_swpentry = page_swap_entry(page);
> > + struct page *page;
> > + swp_entry_t page_swpentry;
> > struct zswap_entry *entry, *old;
> > + size_t compressed_bytes = 0;
> > + u8 nr_pages = ei - si + 1;
> > + u8 i;
> > +
> > + for (i = 0; i < nr_pages; ++i) {
> > + page = folio_page(folio, si + i);
> > + page_swpentry = page_swap_entry(page);
> > +
> > + /* allocate entry */
> > + entry = zswap_entry_cache_alloc(GFP_KERNEL,
> page_to_nid(page));
> > + if (!entry) {
> > + zswap_reject_kmemcache_fail++;
> > + return -EINVAL;
> > + }
>
> I think this patch is wrong on its own, for example if an allocation
> fails in the above loop we exit without cleaning up previous
> allocations. I think it's fixed in patch 2 but we cannot introduce

I think there might be a misunderstanding. zswap_store_pages() will
clean up local resources allocated during an iteration of the for loop,
upon an error in that iteration. If you see the "goto store_failed" and
"goto compress_failed" this would explain what I mean. If an allocation
fails for an iteration, we simply return -EINVAL, and zswap_store()
will goto the "put_pool" label with "ret" still false, which will delete
all zswap entries for the folio (allocated up until the error iteration in
zswap_store_pages(); or potentially already in the xarray).

Hence, there is no bug and each of the two patches are correct by
themselves AFAICT, but please let me know if I am missing anything. Thanks!

> bugs in-between patches. I think the helpers in patch 2 don't really
> help as I mentioned. Please combine the changes and keep them in the
> main series (unless you have a reason not to).

Sure. As noted in my earlier response to comments received for patch 2,
I can either inline all iterations or create helpers for all iterations over
the pages in a batch. Appreciate your suggestions on which would be a
better approach.

Thanks,
Kanchana

>
> >
> > - /* allocate entry */
> > - entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page));
> > - if (!entry) {
> > - zswap_reject_kmemcache_fail++;
> > - return -EINVAL;
> > - }
> > -
> > - if (!zswap_compress(page, entry, pool))
> > - goto compress_failed;
> > + if (!zswap_compress(page, entry, pool))
> > + goto compress_failed;
> >
> > - old = xa_store(swap_zswap_tree(page_swpentry),
> > - swp_offset(page_swpentry),
> > - entry, GFP_KERNEL);
> > - if (xa_is_err(old)) {
> > - int err = xa_err(old);
> > + old = xa_store(swap_zswap_tree(page_swpentry),
> > + swp_offset(page_swpentry),
> > + entry, GFP_KERNEL);
> > + if (xa_is_err(old)) {
> > + int err = xa_err(old);
> >
> > - WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n",
> err);
> > - zswap_reject_alloc_fail++;
> > - goto store_failed;
> > - }
> > + WARN_ONCE(err != -ENOMEM, "unexpected xarray error:
> %d\n", err);
> > + zswap_reject_alloc_fail++;
> > + goto store_failed;
> > + }
> >
> > - /*
> > - * We may have had an existing entry that became stale when
> > - * the folio was redirtied and now the new version is being
> > - * swapped out. Get rid of the old.
> > - */
> > - if (old)
> > - zswap_entry_free(old);
> > + /*
> > + * We may have had an existing entry that became stale when
> > + * the folio was redirtied and now the new version is being
> > + * swapped out. Get rid of the old.
> > + */
> > + if (old)
> > + zswap_entry_free(old);
> >
> > - /*
> > - * The entry is successfully compressed and stored in the tree, there is
> > - * no further possibility of failure. Grab refs to the pool and objcg.
> > - * These refs will be dropped by zswap_entry_free() when the entry is
> > - * removed from the tree.
> > - */
> > - zswap_pool_get(pool);
> > - if (objcg)
> > - obj_cgroup_get(objcg);
> > + /*
> > + * The entry is successfully compressed and stored in the tree,
> there is
> > + * no further possibility of failure. Grab refs to the pool and objcg.
> > + * These refs will be dropped by zswap_entry_free() when the
> entry is
> > + * removed from the tree.
> > + */
> > + zswap_pool_get(pool);
> > + if (objcg)
> > + obj_cgroup_get(objcg);
> >
> > - /*
> > - * We finish initializing the entry while it's already in xarray.
> > - * This is safe because:
> > - *
> > - * 1. Concurrent stores and invalidations are excluded by folio lock.
> > - *
> > - * 2. Writeback is excluded by the entry not being on the LRU yet.
> > - * The publishing order matters to prevent writeback from seeing
> > - * an incoherent entry.
> > - */
> > - entry->pool = pool;
> > - entry->swpentry = page_swpentry;
> > - entry->objcg = objcg;
> > - entry->referenced = true;
> > - if (entry->length) {
> > - INIT_LIST_HEAD(&entry->lru);
> > - zswap_lru_add(&zswap_list_lru, entry);
> > - }
> > + /*
> > + * We finish initializing the entry while it's already in xarray.
> > + * This is safe because:
> > + *
> > + * 1. Concurrent stores and invalidations are excluded by folio
> lock.
> > + *
> > + * 2. Writeback is excluded by the entry not being on the LRU yet.
> > + * The publishing order matters to prevent writeback from seeing
> > + * an incoherent entry.
> > + */
> > + entry->pool = pool;
> > + entry->swpentry = page_swpentry;
> > + entry->objcg = objcg;
> > + entry->referenced = true;
> > + if (entry->length) {
> > + INIT_LIST_HEAD(&entry->lru);
> > + zswap_lru_add(&zswap_list_lru, entry);
> > + }
> >
> > - return entry->length;
> > + compressed_bytes += entry->length;
> > + continue;
> >
> > store_failed:
> > - zpool_free(pool->zpool, entry->handle);
> > + zpool_free(pool->zpool, entry->handle);
> > compress_failed:
> > - zswap_entry_cache_free(entry);
> > - return -EINVAL;
> > + zswap_entry_cache_free(entry);
> > + return -EINVAL;
> > + }
> > +
> > + return compressed_bytes;
> > }
> >
> > bool zswap_store(struct folio *folio)
> > @@ -1492,7 +1510,7 @@ bool zswap_store(struct folio *folio)
> > struct zswap_pool *pool;
> > size_t compressed_bytes = 0;
> > bool ret = false;
> > - long index;
> > + long si, ei, incr = SWAP_CRYPTO_BATCH_SIZE;
> >
> > VM_WARN_ON_ONCE(!folio_test_locked(folio));
> > VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
> > @@ -1526,11 +1544,13 @@ bool zswap_store(struct folio *folio)
> > mem_cgroup_put(memcg);
> > }
> >
> > - for (index = 0; index < nr_pages; ++index) {
> > - struct page *page = folio_page(folio, index);
> > + /* Store the folio in batches of SWAP_CRYPTO_BATCH_SIZE pages. */
> > + for (si = 0, ei = min(si + incr - 1, nr_pages - 1);
> > + ((si < nr_pages) && (ei < nr_pages));
> > + si = ei + 1, ei = min(si + incr - 1, nr_pages - 1)) {
> > ssize_t bytes;
> >
> > - bytes = zswap_store_page(page, objcg, pool);
> > + bytes = zswap_store_pages(folio, si, ei, objcg, pool);
> > if (bytes < 0)
> > goto put_pool;
> > compressed_bytes += bytes;
> > @@ -1565,9 +1585,9 @@ bool zswap_store(struct folio *folio)
> > struct zswap_entry *entry;
> > struct xarray *tree;
> >
> > - for (index = 0; index < nr_pages; ++index) {
> > - tree = swap_zswap_tree(swp_entry(type, offset + index));
> > - entry = xa_erase(tree, offset + index);
> > + for (si = 0; si < nr_pages; ++si) {
> > + tree = swap_zswap_tree(swp_entry(type, offset + si));
> > + entry = xa_erase(tree, offset + si);
> > if (entry)
> > zswap_entry_free(entry);
> > }
> > --
> > 2.27.0
> >