RE: [PATCH v13 13/22] crypto: iaa - IAA Batching for parallel compressions/decompressions.

From: Sridhar, Kanchana P
Date: Sun Nov 16 2025 - 13:53:48 EST



> -----Original Message-----
> From: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> Sent: Friday, November 14, 2025 1:59 AM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> hannes@xxxxxxxxxxx; yosry.ahmed@xxxxxxxxx; nphamcs@xxxxxxxxx;
> chengming.zhou@xxxxxxxxx; usamaarif642@xxxxxxxxx;
> ryan.roberts@xxxxxxx; 21cnbao@xxxxxxxxx;
> ying.huang@xxxxxxxxxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx;
> senozhatsky@xxxxxxxxxxxx; sj@xxxxxxxxxx; kasong@xxxxxxxxxxx; linux-
> crypto@xxxxxxxxxxxxxxx; davem@xxxxxxxxxxxxx; clabbe@xxxxxxxxxxxx;
> ardb@xxxxxxxxxx; ebiggers@xxxxxxxxxx; surenb@xxxxxxxxxx; Accardi,
> Kristen C <kristen.c.accardi@xxxxxxxxx>; Gomes, Vinicius
> <vinicius.gomes@xxxxxxxxx>; Feghali, Wajdi K <wajdi.k.feghali@xxxxxxxxx>;
> Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>
> Subject: Re: [PATCH v13 13/22] crypto: iaa - IAA Batching for parallel
> compressions/decompressions.
>
> On Tue, Nov 04, 2025 at 01:12:26AM -0800, Kanchana P Sridhar wrote:
> >
> > +/**
> > + * This API provides IAA compress batching functionality for use by swap
> > + * modules.
> > + *
> > + * @ctx: compression ctx for the requested IAA mode (fixed/dynamic).
> > + * @parent_req: The "parent" iaa_req that contains SG lists for the batch's
> > + * inputs and outputs.
> > + * @unit_size: The unit size to apply to @parent_req->slen to get the
> number of
> > + * scatterlists it contains.
> > + *
> > + * The caller should check the individual sg->lengths in the @parent_req
> for
> > + * errors, including incompressible page errors.
> > + *
> > + * Returns 0 if all compress requests in the batch complete successfully,
> > + * -EINVAL otherwise.
> > + */
> > +static int iaa_comp_acompress_batch(
> > + struct iaa_compression_ctx *ctx,
> > + struct iaa_req *parent_req,
> > + unsigned int unit_size)
> > +{
> > + struct iaa_batch_ctx *cpu_ctx = raw_cpu_ptr(iaa_batch_ctx);
> > + int nr_reqs = parent_req->slen / unit_size;
> > + int errors[IAA_CRYPTO_MAX_BATCH_SIZE];
> > + int *dlens[IAA_CRYPTO_MAX_BATCH_SIZE];
> > + bool compressions_done = false;
> > + struct sg_page_iter sgiter;
> > + struct scatterlist *sg;
> > + struct iaa_req **reqs;
> > + int i, err = 0;
> > +
> > + mutex_lock(&cpu_ctx->mutex);
> > +
> > + reqs = cpu_ctx->reqs;
> > +
> > + __sg_page_iter_start(&sgiter, parent_req->src, nr_reqs,
> > + parent_req->src->offset/unit_size);
> > +
> > + for (i = 0; i < nr_reqs; ++i, ++sgiter.sg_pgoffset) {
> > + sg_set_page(reqs[i]->src, sg_page_iter_page(&sgiter),
> PAGE_SIZE, 0);
> > + reqs[i]->slen = PAGE_SIZE;
> > + }
> > +
> > + for_each_sg(parent_req->dst, sg, nr_reqs, i) {
> > + sg->length = PAGE_SIZE;
> > + dlens[i] = &sg->length;
> > + reqs[i]->dst = sg;
> > + reqs[i]->dlen = PAGE_SIZE;
> > + }
> > +
> > + iaa_set_req_poll(reqs, nr_reqs, true);
> > +
> > + /*
> > + * Prepare and submit the batch of iaa_reqs to IAA. IAA will process
> > + * these compress jobs in parallel.
> > + */
> > + for (i = 0; i < nr_reqs; ++i) {
> > + errors[i] = iaa_comp_acompress(ctx, reqs[i]);
> > +
> > + if (likely(errors[i] == -EINPROGRESS)) {
> > + errors[i] = -EAGAIN;
> > + } else if (unlikely(errors[i])) {
> > + *dlens[i] = errors[i];
> > + err = -EINVAL;
> > + } else {
> > + *dlens[i] = reqs[i]->dlen;
> > + }
> > + }
> > +
> > + /*
> > + * Asynchronously poll for and process IAA compress job completions.
> > + */
> > + while (!compressions_done) {
> > + compressions_done = true;
> > +
> > + for (i = 0; i < nr_reqs; ++i) {
> > + /*
> > + * Skip, if the compression has already completed
> > + * successfully or with an error.
> > + */
> > + if (errors[i] != -EAGAIN)
> > + continue;
> > +
> > + errors[i] = iaa_comp_poll(ctx, reqs[i]);
> > +
> > + if (errors[i]) {
> > + if (likely(errors[i] == -EAGAIN)) {
> > + compressions_done = false;
> > + } else {
> > + *dlens[i] = errors[i];
> > + err = -EINVAL;
> > + }
> > + } else {
> > + *dlens[i] = reqs[i]->dlen;
> > + }
> > + }
> > + }
>
> Why is this polling necessary?
>
> The crypto_acomp interface is async, even if the only user that
> you're proposing is synchronous.
>
> IOW the driver shouldn't care about synchronous polling at all.
> Just invoke the callback once all the sub-requests are complete
> and the wait call in zswap will take care of the rest.

Hi Herbert,

This is a simple/low-overhead implementation that tries to avail of
hardware parallelism by launching multiple compress/decompress jobs
to the accelerator. Each job runs independently of the other from a
driver perspective. For e.g., no assumptions are made in the driver
about submission order vis-à-vis completion order. Completions can
occur asynchronously.

The polling is intended for exactly the purpose you mention, namely,
to know when all the sub-requests are complete and to set the sg->length
as each sub-request completes. Please let me know if I understood your
question correctly.

Thanks,
Kanchana

>
> Cheers,
> --
> Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt