RE: [PATCH v7] mm/zswap: move to use crypto_acomp API for hardware acceleration
From: Song Bao Hua (Barry Song)
Date: Mon Nov 09 2020 - 06:46:12 EST
> -----Original Message-----
> From: Sebastian Andrzej Siewior [mailto:bigeasy@xxxxxxxxxxxxx]
> Sent: Monday, November 9, 2020 11:29 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>
> Cc: linux-mm@xxxxxxxxx; linux-crypto@xxxxxxxxxxxxxxx;
> akpm@xxxxxxxxxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; fanghao (A)
> <fanghao11@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; Vitaly Wool
> <vitalywool@xxxxxxxxx>; Luis Claudio R . Goncalves <lgoncalv@xxxxxxxxxx>;
> Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>; David S . Miller
> <davem@xxxxxxxxxxxxx>; Mahipal Challa <mahipalreddy2006@xxxxxxxxx>;
> Seth Jennings <sjenning@xxxxxxxxxx>; Dan Streetman <ddstreet@xxxxxxxx>;
> Wangzhou (B) <wangzhou1@xxxxxxxxxxxxx>; Colin Ian King
> <colin.king@xxxxxxxxxxxxx>
> Subject: Re: [PATCH v7] mm/zswap: move to use crypto_acomp API for
> hardware acceleration
>
> I've been looking at the patch and it looks like it should work. Having numbers
Thanks very much for reviewing it and the previous patches. I appreciate.
> to backup the performance in the pure-software version and with HW
> acceleration would _very_ nice to have.
Sure. The 1st step is fixing the broken connect between new crypto APIs and
zswap. Then we are going to dig into possible performance improvements.
If we put all the goals in the single first step, it will be hard for us to see an
accomplishment of any real improvement.
>
> On 2020-11-07 19:53:32 [+1300], Barry Song wrote:
> > index fbb7829..73f04de 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -415,30 +445,54 @@ static int zswap_dstmem_dead(unsigned int cpu)
> …
> > + acomp_ctx->req = req;
> > +
> > + crypto_init_wait(&acomp_ctx->wait);
> > + /*
> > + * if the backend of acomp is async zip, crypto_req_done() will wakeup
> > + * crypto_wait_req(); if the backend of acomp is scomp, the callback
> > + * won't be called, crypto_wait_req() will return without blocking.
> > + */
> > + acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
> > + crypto_req_done, &acomp_ctx->wait);
> > +
> > + acomp_ctx->mutex = per_cpu(zswap_mutex, cpu);
> > + acomp_ctx->dstmem = per_cpu(zswap_dstmem, cpu);
>
> You added a comment here and there you never mentioned that this single
> per-CPU mutex protects the per-CPU context (which you can have more than
> one on a single CPU) and the scratch/dstmem which is one per-CPU. Of course
> if you read the code you figure it out.
> I still think that you should have a pool of memory and crypto contexts which
> you can use instead of having them strictly per-CPU. The code is fully
> preemptible and you may have multiple requests on the same CPU.
> Yes, locking works but at the same you block processing while waiting on a lock
> and the "reserved memory" on other CPUs remains unused.
For sure the buffer pool will be put into my Todo list on zswap project. For this
moment, let's fix the broken connection between ZIP drivers and zswap first.
This will help build the faith on the whole project and motivate the move to
the next step.
Step by step, we will make zswap better and better on performance by leveraging
the power of ZIP hardware.
>
> Sebastian
Thanks
Barry