RE: [PATCH] mm: zswap: properly synchronize freeing resources during CPU hotunplug
From: Sridhar, Kanchana P
Date: Wed Jan 08 2025 - 15:23:33 EST
> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
> Sent: Wednesday, January 8, 2025 8:15 AM
> To: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>; Nhat Pham
> <nphamcs@xxxxxxxxx>; Chengming Zhou <chengming.zhou@xxxxxxxxx>;
> Vitaly Wool <vitalywool@xxxxxxxxx>; Barry Song <baohua@xxxxxxxxxx>; Sam
> Sun <samsun1006219@xxxxxxxxx>; Sridhar, Kanchana P
> <kanchana.p.sridhar@xxxxxxxxx>; linux-mm@xxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; Yosry Ahmed <yosryahmed@xxxxxxxxxx>;
> stable@xxxxxxxxxxxxxxx
> Subject: [PATCH] mm: zswap: properly synchronize freeing resources during
> CPU hotunplug
>
> In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of
> the
> current CPU at the beginning of the operation is retrieved and used
> throughout. However, since neither preemption nor migration are
> disabled, it is possible that the operation continues on a different
> CPU.
>
> If the original CPU is hotunplugged while the acomp_ctx is still in use,
> we run into a UAF bug as some of the resources attached to the acomp_ctx
> are freed during hotunplug in zswap_cpu_comp_dead().
>
> The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to
> use crypto_acomp API for hardware acceleration") when the switch to the
> crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was
> retrieved using get_cpu_ptr() which disables preemption and makes sure
> the CPU cannot go away from under us. Preemption cannot be disabled
> with the crypto_acomp API as a sleepable context is needed.
>
> During CPU hotunplug, hold the acomp_ctx.mutex before freeing any
> resources, and set acomp_ctx.req to NULL when it is freed. In the
> compress/decompress paths, after acquiring the acomp_ctx.mutex make sure
> that acomp_ctx.req is not NULL (i.e. acomp_ctx resources were not freed
> by CPU hotunplug). Otherwise, retry with the acomp_ctx from the new CPU.
>
> This adds proper synchronization to ensure that the acomp_ctx resources
> are not freed from under compress/decompress paths.
>
> Note that the per-CPU acomp_ctx itself (including the mutex) is not
> freed during CPU hotunplug, only acomp_ctx.req, acomp_ctx.buffer, and
> acomp_ctx.acomp. So it is safe to acquire the acomp_ctx.mutex of a CPU
> after it is hotunplugged.
Only other fail-proofing I can think of is to initialize the mutex right after
the per-cpu acomp_ctx is allocated in zswap_pool_create() and de-couple
it from the cpu onlining. This further clarifies the intent for this mutex
to be used at the same lifetime scope as the acomp_ctx itself, independent
of cpu hotplug/hotunplug.
Thanks,
Kanchana
>
> Previously a fix was attempted by holding cpus_read_lock() [1]. This
> would have caused a potential deadlock as it is possible for code
> already holding the lock to fall into reclaim and enter zswap (causing a
> deadlock). A fix was also attempted using SRCU for synchronization, but
> Johannes pointed out that synchronize_srcu() cannot be used in CPU
> hotplug notifiers [2].
>
> Alternative fixes that were considered/attempted and could have worked:
> - Refcounting the per-CPU acomp_ctx. This involves complexity in
> handling the race between the refcount dropping to zero in
> zswap_[de]compress() and the refcount being re-initialized when the
> CPU is onlined.
> - Disabling migration before getting the per-CPU acomp_ctx [3], but
> that's discouraged and is a much bigger hammer than needed, and could
> result in subtle performance issues.
>
> [1]https://lkml.kernel.org/20241219212437.2714151-1-
> yosryahmed@xxxxxxxxxx/
> [2]https://lkml.kernel.org/20250107074724.1756696-2-
> yosryahmed@xxxxxxxxxx/
> [3]https://lkml.kernel.org/20250107222236.2715883-2-
> yosryahmed@xxxxxxxxxx/
>
> Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for
> hardware acceleration")
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
> Reported-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> Closes:
> https://lore.kernel.org/lkml/20241113213007.GB1564047@xxxxxxxxxxx/
> Reported-by: Sam Sun <samsun1006219@xxxxxxxxx>
> Closes:
> https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4O
> curuL4tPg6OaQ@xxxxxxxxxxxxxx/
> ---
>
> This applies on top of the latest mm-hotfixes-unstable on top of 'Revert
> "mm: zswap: fix race between [de]compression and CPU hotunplug"' and
> after 'mm: zswap: disable migration while using per-CPU acomp_ctx' was
> dropped.
>
> ---
> mm/zswap.c | 42 +++++++++++++++++++++++++++++++++---------
> 1 file changed, 33 insertions(+), 9 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index f6316b66fb236..4e3148050e093 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -869,17 +869,46 @@ static int zswap_cpu_comp_dead(unsigned int cpu,
> struct hlist_node *node)
> struct zswap_pool *pool = hlist_entry(node, struct zswap_pool,
> node);
> struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool-
> >acomp_ctx, cpu);
>
> + mutex_lock(&acomp_ctx->mutex);
> if (!IS_ERR_OR_NULL(acomp_ctx)) {
> if (!IS_ERR_OR_NULL(acomp_ctx->req))
> acomp_request_free(acomp_ctx->req);
> + acomp_ctx->req = NULL;
> if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
> crypto_free_acomp(acomp_ctx->acomp);
> kfree(acomp_ctx->buffer);
> }
> + mutex_unlock(&acomp_ctx->mutex);
>
> return 0;
> }
>
> +static struct crypto_acomp_ctx *acomp_ctx_get_cpu_lock(
> + struct crypto_acomp_ctx __percpu *acomp_ctx)
> +{
> + struct crypto_acomp_ctx *ctx;
> +
> + for (;;) {
> + ctx = raw_cpu_ptr(acomp_ctx);
> + mutex_lock(&ctx->mutex);
> + if (likely(ctx->req))
> + return ctx;
> + /*
> + * It is possible that we were migrated to a different CPU
> after
> + * getting the per-CPU ctx but before the mutex was
> acquired. If
> + * the old CPU got offlined, zswap_cpu_comp_dead() could
> have
> + * already freed ctx->req (among other things) and set it to
> + * NULL. Just try again on the new CPU that we ended up on.
> + */
> + mutex_unlock(&ctx->mutex);
> + }
> +}
> +
> +static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *ctx)
> +{
> + mutex_unlock(&ctx->mutex);
> +}
> +
> static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> struct zswap_pool *pool)
> {
> @@ -893,10 +922,7 @@ static bool zswap_compress(struct page *page,
> struct zswap_entry *entry,
> gfp_t gfp;
> u8 *dst;
>
> - acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
> -
> - mutex_lock(&acomp_ctx->mutex);
> -
> + acomp_ctx = acomp_ctx_get_cpu_lock(pool->acomp_ctx);
> dst = acomp_ctx->buffer;
> sg_init_table(&input, 1);
> sg_set_page(&input, page, PAGE_SIZE, 0);
> @@ -949,7 +975,7 @@ static bool zswap_compress(struct page *page, struct
> zswap_entry *entry,
> else if (alloc_ret)
> zswap_reject_alloc_fail++;
>
> - mutex_unlock(&acomp_ctx->mutex);
> + acomp_ctx_put_unlock(acomp_ctx);
> return comp_ret == 0 && alloc_ret == 0;
> }
>
> @@ -960,9 +986,7 @@ static void zswap_decompress(struct zswap_entry
> *entry, struct folio *folio)
> struct crypto_acomp_ctx *acomp_ctx;
> u8 *src;
>
> - acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> - mutex_lock(&acomp_ctx->mutex);
> -
> + acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool->acomp_ctx);
> src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
> /*
> * If zpool_map_handle is atomic, we cannot reliably utilize its
> mapped buffer
> @@ -986,10 +1010,10 @@ static void zswap_decompress(struct
> zswap_entry *entry, struct folio *folio)
> acomp_request_set_params(acomp_ctx->req, &input, &output,
> entry->length, PAGE_SIZE);
> BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx-
> >req), &acomp_ctx->wait));
> BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
> - mutex_unlock(&acomp_ctx->mutex);
>
> if (src != acomp_ctx->buffer)
> zpool_unmap_handle(zpool, entry->handle);
> + acomp_ctx_put_unlock(acomp_ctx);
> }
>
> /*********************************
> --
> 2.47.1.613.gc27f4b7a9f-goog