Re: [PATCH 0/2] zswap pool per-CPU acomp_ctx simplifications

From: Kanchana P. Sridhar

Date: Sat Mar 14 2026 - 23:30:52 EST


On Sat, Mar 14, 2026 at 5:11 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, 13 Mar 2026 22:16:30 -0700 "Kanchana P. Sridhar" <kanchanapsridhar2026@xxxxxxxxx> wrote:
>
> >
> > This patchset persists the zswap pool's per-CPU acomp_ctx resources to
> > last until the pool is destroyed. It then simplifies the per-CPU
> > acomp_ctx mutex locking in zswap_compress()/zswap_decompress().
> >
> > Further, it consistently uses the same checks for valid
> > acomp_ctx->acomp/req in zswap procedures that allocate/deallocate
> > acomp_ctx members.
>
> Sashiko has questions:
> https://sashiko.dev/#/patchset/20260314051632.17931-1-kanchanapsridhar2026%40gmail.com
>

Thanks Andrew, for sharing the questions raised by Sashiko. I have
provided some responses below:

> > @@ -786,7 +786,7 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
> > return ret;
> >
> > acomp_ctx->acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_node(cpu));
> > - if (IS_ERR(acomp_ctx->acomp)) {
> > + if (IS_ERR_OR_NULL(acomp_ctx->acomp)) {
> Does crypto_alloc_acomp_node() ever return NULL?
> Looking at the error handling just below this check, if this were to
> actually return NULL, PTR_ERR(NULL) evaluates to 0. This would cause
> the function to incorrectly return 0 (success) instead of an error code,
> hiding the allocation failure.

This is a good catch. Just to provide context, this patch was
introduced based on Yosry's earlier comments in [1].

[1]: https://patchwork.kernel.org/comment/26282128/

crypto_alloc_acomp_node() currently does not return NULL. However, it
could, in future.
Since the rest of zswap_cpu_comp_prepare() dereferences
acomp_ctx->acomp, it depends on whether we want to future-proof the
code to handle a possible eventuality of crypto_alloc_acomp_node()
returning NULL.

If the maintainers think future-proofing is beneficial, I would need
to handle the PTR_ERR(NULL) which would send a false success status.
If we don't think we need to handle a future NULL return from
crypto_alloc_acomp_node(), then I don't think this change is needed.
We could leave it as IS_ERR(acomp_ctx->acomp). I would like to get the
maintainers' inputs on how to proceed.

> > acomp_ctx->req = acomp_request_alloc(acomp_ctx->acomp);
> > - if (!acomp_ctx->req) {
> > + if (IS_ERR_OR_NULL(acomp_ctx->req)) {
> Is this change necessary for acomp_request_alloc()?
> This function strictly returns NULL on allocation failure, not an error
> pointer. Changing this to IS_ERR_OR_NULL() obscures the actual API contract
> without providing a functional benefit.

As of now, acomp_request_alloc() returns a valid "req" or NULL in case
of an error. Same question as above. The only benefit would be making
the code more robust to handle changes in the acomp API in future.

> > pr_err("could not alloc crypto acomp_request %s\n",
> > pool->tfm_name);
> > goto fail;
> Can this error path lead to a double-free or use-after-free?
> This isn't a bug introduced by this patch, but while reviewing the modified
> error path, it appears that when zswap_cpu_comp_prepare() fails and jumps
> to fail, acomp_ctx_dealloc() frees acomp_ctx->acomp and acomp_ctx->buffer
> but does not set those pointers to NULL.
> Because acomp_ctx is persistent per-CPU memory, if the CPU later comes back
> online, the early initialization check:
> if (acomp_ctx->acomp) {
> WARN_ON_ONCE(IS_ERR(acomp_ctx->acomp));
> return 0;
> }
> would see the dangling pointer, incorrectly return success, and leave the
> CPU with an uninitialized mutex and freed buffer.

Thanks again, good catch. I was thinking some more about this today,
and realized this could happen if let's say a batch of CPUs were
online at pool creation time, and the startup callback succeeds for
all of them. Now, if a CPU "x" is brought online after pool creation
succeeds, encounters an error in zswap_cpu_comp_prepare(), calls
acomp_ctx_dealloc(), then CPU "x"'s state rewinding to CPU_OFFLINE. If
the CPU "x" is again brought back online, the above use-after-free
would occur. I had missed this during my response to Yosry's v14
comment [2] yesterday, sorry about that.

I will set the acomp_ctx pointers to NULL, as Yosry suggested in [2].

[2]: https://patchwork.kernel.org/comment/26773983/

> Additionally, if the pool creation fails entirely, the fallback cleanup loop
> in zswap_pool_create() calls acomp_ctx_dealloc() again for all CPUs, which
> would cause a double-free on the CPU that initially failed.
> Should acomp_ctx_dealloc() clear these pointers after freeing them?

Yes, I agree, and will include a fix in v2 once I get the maintainers'
feedback on how to proceed on patch 2.

Thanks,
Kanchana