Re: [PATCH] slab: distinguish lock and trylock for sheaf_flush_main()
From: Vlastimil Babka (SUSE)
Date: Thu Feb 26 2026 - 09:54:07 EST
On 2/11/26 10:42, Vlastimil Babka wrote:
> sheaf_flush_main() can be called from __pcs_replace_full_main() where
> the trylock can in theory fail, and pcs_flush_all() where it's not
> expected to and it would be actually a problem if it failed and left the
> main sheaf not flushed.
Thinking about this more, I now think it's not a theoretical issue because
on PREEMPT_RT I think pcs_flush_all() can preempt someone holding the lock
(on PREEMPT_RT it doesn't have to be an irq handler preempting a holder),
and then fail to flush the main sheaf silently.
The impact is probably limited though - if this failure to flush happens in
__kmem_cache_shutdown(), it means someone was destroying a cache while using
it, so that was already buggy. slab_mem_going_offline_callback() could be
where this matters although it's unlikely someone would do memory hotplug
together with PREEMPT_RT.
But maybe still worth tagging this as Fixes: 2d517aa09bbc ("slab: add opt-in
caching layer of percpu sheaves") and Cc stable and sending it as a hotfix.
> To make this explicit, split the function into sheaf_flush_main() (using
> local_lock()) and sheaf_try_flush_main() (using local_trylock()) where
> both call __sheaf_flush_main_batch() to flush a single batch of objects.
> This will allow lockdep to verify our assumptions.
>
> Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>
> ---
> mm/slub.c | 47 +++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 37 insertions(+), 10 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 18c30872d196..12912b29f5bb 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2844,19 +2844,19 @@ static void __kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p);
> * object pointers are moved to a on-stack array under the lock. To bound the
> * stack usage, limit each batch to PCS_BATCH_MAX.
> *
> - * returns true if at least partially flushed
> + * Must be called with s->cpu_sheaves->lock locked, returns with the lock
> + * unlocked.
> + *
> + * Returns how many objects are remaining to be flushed
> */
> -static bool sheaf_flush_main(struct kmem_cache *s)
> +static unsigned int __sheaf_flush_main_batch(struct kmem_cache *s)
> {
> struct slub_percpu_sheaves *pcs;
> unsigned int batch, remaining;
> void *objects[PCS_BATCH_MAX];
> struct slab_sheaf *sheaf;
> - bool ret = false;
>
> -next_batch:
> - if (!local_trylock(&s->cpu_sheaves->lock))
> - return ret;
> + lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
>
> pcs = this_cpu_ptr(s->cpu_sheaves);
> sheaf = pcs->main;
> @@ -2874,10 +2874,37 @@ static bool sheaf_flush_main(struct kmem_cache *s)
>
> stat_add(s, SHEAF_FLUSH, batch);
>
> - ret = true;
> + return remaining;
> +}
>
> - if (remaining)
> - goto next_batch;
> +static void sheaf_flush_main(struct kmem_cache *s)
> +{
> + unsigned int remaining;
> +
> + do {
> + local_lock(&s->cpu_sheaves->lock);
> +
> + remaining = __sheaf_flush_main_batch(s);
> +
> + } while (remaining);
> +}
> +
> +/*
> + * Returns true if the main sheaf was at least partially flushed.
> + */
> +static bool sheaf_try_flush_main(struct kmem_cache *s)
> +{
> + unsigned int remaining;
> + bool ret = false;
> +
> + do {
> + if (!local_trylock(&s->cpu_sheaves->lock))
> + return ret;
> +
> + ret = true;
> + remaining = __sheaf_flush_main_batch(s);
> +
> + } while (remaining);
>
> return ret;
> }
> @@ -5685,7 +5712,7 @@ __pcs_replace_full_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
> if (put_fail)
> stat(s, BARN_PUT_FAIL);
>
> - if (!sheaf_flush_main(s))
> + if (!sheaf_try_flush_main(s))
> return NULL;
>
> if (!local_trylock(&s->cpu_sheaves->lock))
>
> ---
> base-commit: 27125df9a5d3b4cfd03bce3a8ec405a368cc9aae
> change-id: 20260211-b4-sheaf-flush-2eb99a9c8bfb
>
> Best regards,