Re: [PATCH v13 21/22] mm: zswap: zswap_store() will process a large folio in batches.

From: Yosry Ahmed

Date: Thu Dec 11 2025 - 23:41:10 EST

On Fri, Dec 12, 2025 at 01:43:54AM +0000, Sridhar, Kanchana P wrote:
[..]
> >
> > Instead of this:
> >
> > > +/*
> > > + * Returns 0 if kmem_cache_alloc_bulk() failed and a positive number
> > otherwise.
> > > + * The code for __kmem_cache_alloc_bulk() indicates that this positive
> > number
> > > + * will be the @size requested, i.e., @nr_entries.
> > > + */
> > > +static __always_inline int zswap_entries_cache_alloc_batch(void
> > **entries,
> > > + unsigned int
> > nr_entries,
> > > + gfp_t gfp)
> > > +{
> > > + int nr_alloc = kmem_cache_alloc_bulk(zswap_entry_cache, gfp,
> > > + nr_entries, entries);
> > > +
> >
> > Add this here:
> > /*
> > * kmem_cache_alloc_bulk() should return nr_entries on success
> > * and 0 on failure.
> > */
> >
>
> Sure.
>
> > > + WARN_ON(!nr_alloc || (nr_alloc != nr_entries));
> >
> > WARN_ON_ONCE() is sufficient, and why do we WARN if
> > kmem_cache_alloc_bulk() fails? I thought that was expected in some
> > cases.
>
> I can change this to a WARN_ON_ONCE(). The code for kmem_cache_alloc_bulk()
> makes sure that either all entries are allocated, or none are allocated
> (partial allocations are freed and 0 returned in case of the latter). It can be expected
> to fail based on this.

Right, I mean specifically the !nr_alloc case. This should be expected,
so we should not WARN in this case, right? IIUC, we should do:

WARN_ON_ONCE(nr_alloc && nr_alloc != nr_entries)

>
> I believe there was an earlier comment for which I added the WARN_ON? I can
> either change this to WARN_ON_ONCE() or drop the WARN_ON_ONCE(), since
> we anyway have a fallback mechanism.
>
> >
> > > +
> > > + return nr_alloc;
> > > +}
> > > +
> >
[..]
> > > +static bool zswap_store_pages(struct folio *folio,
> > > + long start,
> > > + long end,
> > > + struct obj_cgroup *objcg,
> > > + struct zswap_pool *pool,
> > > + int nid,
> > > + bool wb_enabled)
> > > {
> > > - swp_entry_t page_swpentry = page_swap_entry(page);
> > > - struct zswap_entry *entry, *old;
> > > -
> > > - /* allocate entry */
> > > - entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page));
> > > - if (!entry) {
> > > - zswap_reject_kmemcache_fail++;
> > > - return false;
> > > + struct zswap_entry *entries[ZSWAP_MAX_BATCH_SIZE];
> > > + u8 i, store_fail_idx = 0, nr_pages = end - start;
> > > +
> > > + VM_WARN_ON_ONCE(nr_pages > ZSWAP_MAX_BATCH_SIZE);
> > > +
> > > + if (unlikely(!zswap_entries_cache_alloc_batch((void **)&entries[0],
> >
> > Is this equivalent to just passing in 'entries'?
>
> It is, however, I wanted to keep this equivalent to the failure case call to
> zswap_entries_cache_free_batch(), that passes in the address of the
> batch index that failed xarray store.

I understand, but I think it's clearer to pass 'entries'. Also, can we
make zswap_entries_cache_alloc_batch() take in the proper type and avoid
the cast at the callsites?

>
> >
> > > + nr_pages, GFP_KERNEL)))
> > {
> > > + for (i = 0; i < nr_pages; ++i) {
> > > + entries[i] = zswap_entry_cache_alloc(GFP_KERNEL,
> > nid);
> > > +
> > > + if (unlikely(!entries[i])) {
> > > + zswap_reject_kmemcache_fail++;
> > > + /*
> > > + * While handling this error, we only need to
> > > + * call zswap_entries_cache_free_batch() for
> > > + * entries[0 .. @i-1].
> > > + */
> > > + nr_pages = i;
> > > + goto store_pages_failed;
> > > + }
> > > + }
> >
> >
> > Maybe move the fallback loop into zswap_entries_cache_alloc_batch()?
>
> I could, however, I would need to modify the API to return the error index "i",
> so that the "goto store_pages_failed" works. Imo, inlining this makes the error
> handling more apparent, but let me know.

Hmm yeah. Maybe make zswap_entries_cache_alloc_batch() free the already
allocated entries on failure? Then if zswap_entries_cache_alloc_batch()
fails we exit without goto store_pages_failed. This is the first failure
mode so we don't need any further cleanup anyway.