Re: [PATCH v4 09/10] mm: zswap: Allocate pool batching resources if the crypto_alg supports batching.

From: Yosry Ahmed
Date: Tue Dec 03 2024 - 17:29:05 EST


On Tue, Dec 3, 2024 at 1:37 PM Sridhar, Kanchana P
<kanchana.p.sridhar@xxxxxxxxx> wrote:
>
>
> > -----Original Message-----
> > From: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> > Sent: Tuesday, December 3, 2024 12:01 AM
> > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>
> > Cc: Nhat Pham <nphamcs@xxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; linux-
> > mm@xxxxxxxxx; hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx;
> > chengming.zhou@xxxxxxxxx; usamaarif642@xxxxxxxxx;
> > ryan.roberts@xxxxxxx; ying.huang@xxxxxxxxx; 21cnbao@xxxxxxxxx;
> > akpm@xxxxxxxxxxxxxxxxxxxx; linux-crypto@xxxxxxxxxxxxxxx;
> > davem@xxxxxxxxxxxxx; clabbe@xxxxxxxxxxxx; ardb@xxxxxxxxxx;
> > ebiggers@xxxxxxxxxx; surenb@xxxxxxxxxx; Accardi, Kristen C
> > <kristen.c.accardi@xxxxxxxxx>; Feghali, Wajdi K <wajdi.k.feghali@xxxxxxxxx>;
> > Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>
> > Subject: Re: [PATCH v4 09/10] mm: zswap: Allocate pool batching resources if
> > the crypto_alg supports batching.
> >
> > On Tue, Dec 03, 2024 at 12:30:30AM +0000, Sridhar, Kanchana P wrote:
> > >
> > > > Why do we need this "can_batch" field? IIUC, this can be determined
> > > > from the compressor internal fields itself, no?
> > > >
> > > > acomp_has_async_batching(acomp);
> > > >
> > > > Is this just for convenience, or is this actually an expensive thing to
> > compute?
> > >
> > > Thanks for your comments. This is a good question. I tried not to imply that
> > > batching resources have been allocated for the cpu based only on what
> > > acomp_has_async_batching() returns. It is possible that the cpu onlining
> > > code ran into an -ENOMEM error on any particular cpu. In this case, I set
> > > the pool->can_batch to "false", mainly for convenience, so that zswap
> > > can be somewhat insulated from migration. I agree that this may not be
> > > the best solution; and whether or not batching is enabled can be directly
> > > determined just before the call to crypto_acomp_batch_compress()
> > > based on:
> > >
> > > acomp_ctx->nr_reqs == SWAP_CRYPTO_BATCH_SIZE;
> >
> > With ahash request chaining, the idea is to accumulate as much
> > data as you can before you provide it to the Crypto API. The
> > API is responsible for dividing it up if the underlying driver
> > is only able to handle one request at a time.
> >
> > So that would be the ideal model to use for compression as well.
> > Provide as much data as you can and let the API handle the case
> > where the data needs to be divided up.
>
> Thanks for this suggestion! This sounds like a clean way to handle the
> batching/sequential compress/decompress within the crypto API as long
> as it can be contained in the crypto acompress layer.
> If the zswap maintainers don't have any objections, I can look into the
> feasibility of doing this.

Does this mean that instead of zswap breaking down the folio into
SWAP_CRYPTO_BATCH_SIZE -sized batches, we pass all the pages to the
crypto layer and let it do the batching as it pleases?

It sounds nice on the surface, but this implies that we have to
allocate folio_nr_pages() buffers in zswap, essentially as the
allocation is the same size as the folio itself. While the allocation
does not need to be contiguous, making a large number of allocations
in the reclaim path is definitely not something we want. For a 2M THP,
we'd need to allocate 2M in zswap_store().

If we choose to keep preallocating, assuming the maximum THP size is
2M, we need to allocate 2M * nr_cpus worth of buffers. That's a lot of
memory.

I feel like I am missing something.

>
> Thanks,
> Kanchana
>
> >
> > Cheers,
> > --
> > Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> > Home Page: http://gondor.apana.org.au/~herbert/
> > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt