Re: [PATCH v6] mm/zswap: move to use crypto_acomp API for hardware acceleration

From: Sebastian Andrzej Siewior
Date: Mon Sep 28 2020 - 11:24:38 EST


On 2020-08-19 00:31:00 [+1200], Barry Song wrote:
> diff --git a/mm/zswap.c b/mm/zswap.c
> index fbb782924ccc..00b5f14a7332 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -127,9 +129,17 @@ module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled,
> * data structures
> **********************************/
>
> +struct crypto_acomp_ctx {
> + struct crypto_acomp *acomp;
> + struct acomp_req *req;
> + struct crypto_wait wait;
> + u8 *dstmem;
> + struct mutex *mutex;
> +};
> +
> struct zswap_pool {
> struct zpool *zpool;
> - struct crypto_comp * __percpu *tfm;
> + struct crypto_acomp_ctx __percpu *acomp_ctx;
> struct kref kref;
> struct list_head list;
> struct work_struct release_work;
> @@ -388,23 +398,43 @@ static struct zswap_entry *zswap_entry_find_get(struct rb_root *root,
> * per-cpu code
> **********************************/
> static DEFINE_PER_CPU(u8 *, zswap_dstmem);
> +/*
> + * If users dynamically change the zpool type and compressor at runtime, i.e.
> + * zswap is running, zswap can have more than one zpool on one cpu, but they
> + * are sharing dtsmem. So we need this mutex to be per-cpu.
> + */
> +static DEFINE_PER_CPU(struct mutex *, zswap_mutex);

There is no need to make it sturct mutex*. You could make it struct
mutex. The following make it more obvious how the relation here is (even
without a comment):

|struct zswap_mem_pool {
| void *dst_mem;
| struct mutex lock;
|};
|
|static DEFINE_PER_CPU(struct zswap_mem_pool , zswap_mem_pool);

Please access only this, don't add use a pointer in a another struct to
dest_mem or lock which may confuse people.

This resource is per-CPU. Do you really utilize them all? On 2, 8, 16,
64, 128 core system? More later…

> @@ -916,14 +974,21 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle)
>
> case ZSWAP_SWAPCACHE_NEW: /* page is locked */
> /* decompress */
> + acomp_ctx = this_cpu_ptr(entry->pool->acomp_ctx);

My feeling is that this triggers a warning with CONFIG_DEBUG_PREEMPT. I
don't see how it could be avoid it. If you are not preemptible here then
you must not sleep later.

> +
> dlen = PAGE_SIZE;
> src = (u8 *)zhdr + sizeof(struct zswap_header);
> - dst = kmap_atomic(page);
> - tfm = *get_cpu_ptr(entry->pool->tfm);
> - ret = crypto_comp_decompress(tfm, src, entry->length,
> - dst, &dlen);
> - put_cpu_ptr(entry->pool->tfm);
> - kunmap_atomic(dst);
> + dst = kmap(page);
> +
> + mutex_lock(acomp_ctx->mutex);
> + sg_init_one(&input, src, entry->length);
> + sg_init_one(&output, dst, dlen);
> + acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
> + ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
> + dlen = acomp_ctx->req->dlen;
> + mutex_unlock(acomp_ctx->mutex);

Say a request comes in on CPU1. After the mutex_lock() you get migrated to
CPU2. You do crypto_wait_req() on CPU2. In that time another request
gets in on CPU1. It blocks on the very same mutex. No data corruption but
it could have used another buffer instead of blocking and waiting for
the previous one to finish its work.
So it might make sense to have a pool of pages which are shared in the
system globally system instead of having strict per-CPU allocations
while being fully migrate-able while the are in use.

While at it: For dst you could use sg_set_page(). This would avoid the
kmap().

> + kunmap(page);
> BUG_ON(ret);
> BUG_ON(dlen != PAGE_SIZE);
>

Sebastian