Re: [PATCH for-next v3 0/9] mm/slab: introduce kfree_rcu_nolock() and improve slub_kunit coverage

From: Harry Yoo

Date: Mon Jun 15 2026 - 07:44:00 EST


Now I'm investigating and fixing pre-existing two kmalloc_nolock() bugs
and (hopefully) planning to post the fixes later this week.

I will later rebase the this series onto Vlastimil's slab_alloc_flags v3
and the kmalloc_nolock() fixes, and address human and sashiko's
comments.

But this should be enough for review.

Thanks!

On 6/15/26 8:05 PM, Harry Yoo (Oracle) wrote:
> Not the best time to post a series, but didn't want to delay posting
> the series for too long. no pressures ;) This is aimed to be queued
> for review and testing after the merge window closes.
>
> This series is based on next-20260612, and is also available on
> git.kernel.org [3].
>
> To RCU folks: It would be great if you could kindly take a quick look at
> patch 4 and either ack or nack the patch ;)
>
> To BPF folks: Ulad asked to share workloads to measure performance
> of kfree_rcu_nolock(). Unfortunately, I focused more on correctness
> and have not spent much effort on that. It would be nice if BPF folks
> could help evaluate it on their relevant workloads.
>
> To PREEMPT_RT folks: The most relevant part is allowing
> kfree_rcu_sheaf() on PREEMPT_RT (patch 6). It carefully avoids sleeping
> by acquiring the locks via local_trylock() or spin_trylock_irqsave()
> to avoid sleeping within a raw spinlock. When trylock or unlock is
> unsafe, kmalloc_nolock() always fails.
>
> Changes since RFC v2
> ====================
>
> Reduced complexity and intrusiveness (Uladzislau Rezki)
> -------------------------------------------------------
>
> While discussing concerns about the complexity of adding allow_spin
> handling with Ulad (Thanks!), I realized that adding complexity to the
> kvfree_rcu batching is not strictly necessary: only slab objects need to
> be batched, they are already batched by rcu sheaves, and slab already
> supports unknown context. So it is enough to implement only a minimal
> fallback for the sheaves path.
>
> I tried to avoid making intrusive changes to the existing kvfree_rcu
> path as much as possible. struct rcu_ptr is renamed to kfree_rcu_head
> following Vlastimil's suggestion, and it is used only in the
> kfree_rcu_nolock() path for now.
>
> As a result, the complexity is significantly reduced and the series
> became much less intrusive. This is also reflected well in the diffstat
> below.
>
> RFC v2 diffstat:
> 8 files changed, 514 insertions(+), 163 deletions(-)
>
> v3 diffstat:
> 6 files changed, 370 insertions(+), 105 deletions(-)
>
> v3 diffstat (slub_kunit improvements - patch 1, 2, 9 excluded):
> 5 files changed, 199 insertions(+), 66 deletions(-)
>
> kfree_rcu_sheaf() PREEMPT_RT support (Vlastimil Babka)
> ------------------------------------------------------
>
> As suggested by Vlastimil (Thanks!), kfree_rcu_sheaf() can now be used
> on PREEMPT_RT as well, by always assuming allow_spin is false on
> PREEMPT_RT.
>
> slub_kunit enhancements
> -----------------------
>
> - Currently the test is skipped when there is no hardware PMU. This can
> happen on machines without a PMU, or in virtualized environments
> (e.g., automated testing or virtme). Implement a fallback based on SW
> perf events so that the test can still run in such environments, even
> though the coverage is slightly smaller.
>
> - While testing on PREEMPT_RT, I found that kmalloc_nolock() fails every
> time, so the fallback path is not properly tested. This is a limitation
> of perf events: the handler is called in NMI (HW perf events) or
> interrupt context (SW perf events), where kmalloc_nolock() cannot
> succeed.
>
> slub_kunit now registers a kprobe pre-handler at the points in the slab
> allocator where lockdep_assert_held() is invoked. The pre-handler calls
> kmalloc_nolock() and friends, to improve coverage on PREEMPT_RT instead
> of relying on perf events.
>
> One thing that needs to be further explored
> -------------------------------------------
>
> The global deferred_free_by_rcu (introduced by patch 8) list for the
> fallback should probably be per-CPU [5].
>
> Actual Cover Letter
> ===================
>
> This series improves kmalloc_nolock() and kfree_nolock() coverage
> in slub_kunit (patch 1 and 2) and introduces kfree_rcu_nolock() for
> an unknown context as suggested by Alexei Starovoitov.
>
> Unknown context means the caller does not know whether spinning on a lock
> is safe (e.g., a BPF program attached to an arbitrary kernel function or
> in NMI context).
>
> The slab allocator already supports unknown context via kmalloc_nolock()
> and kfree_nolock(), but te slab allocator does not support freeing
> objects by RCU in unknown context.
>
> It is not ideal to have completely separate batching for unknown context
> because the worst scenario where spinning on a lock would lead to
> deadlock is very rare, and in most cases, it is safe to use the
> existing mechanism (kfree_rcu_sheaf()).
>
> Since most part of the slab allocator already supports unknown context
> and sheaves support batching kvfree_rcu() calls for slab objects,
> implement kfree_rcu_nolock() with minimal changes by teaching
> kfree_rcu_sheaf() how to support unknown context and making
> it a little bit harder to allocate an empty sheaf, instead of making
> intrusive changes to the existing kvfree_rcu batching logic.
>
> kfree_rcu_nolock() tries to free the object to the rcu sheaf if
> trylock succeeds. Once the rcu sheaf becomes full, it is submitted to
> RCU via call_rcu() if spinning is allowed or IRQs are enabled (to avoid
> calling call_rcu() in the middle of call_rcu()). Otherwise, call_rcu()
> is deferred via irq work.
>
> In unknown context, when there is no sheaf available, kfree_rcu_sheaf()
> falls back to defer_kfree_rcu(), which inserts the object to a global
> lockless list [5] and those objects are freed after synchronize_rcu() in
> a workqueue.
>
> Unlike kfree_rcu(), only the 2-argument variant is supported.
> This is because the last resort of the 1-arg variant is
> synchronize_rcu(), which cannot be used in an unknown context.
>
> As suggested by Alexei Starovoitov, kfree_rcu_nolock() can be used with
> struct kfree_rcu_head (8 bytes), which is smaller than struct rcu_head
> (16 bytes).
>
> For more background and future plans, please see [4].
>
> [1] RFC v1: https://lore.kernel.org/linux-mm/20260206093410.160622-1-harry.yoo@xxxxxxxxxx
>
> [2] RFC v2: https://lore.kernel.org/linux-mm/20260416091022.36823-1-harry@xxxxxxxxxx
>
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/harry/linux.git/log/?h=kfree_rcu_nolock-v3r3
>
> [4] kmalloc_nolock() follow-ups, including kfree_rcu_nolock(),
> https://lore.kernel.org/linux-mm/esepccfhqg7m6jo76ns2znj2cnuaepx2xvw5zaygtwohq4psma@563ypprp6rr3
>
> [5] However, we should probably make the list percpu because,
> unlike RFC v2, it can be triggered more frequently under memory
> pressure.
>
> https://lore.kernel.org/linux-mm/805c33d7-3a7b-470c-bd9d-065717a3e3e2@paulmck-laptop
>
> Signed-off-by: Harry Yoo (Oracle) <harry@xxxxxxxxxx>
> ---
> Harry Yoo (Oracle) (9):
> slub_kunit: fall back to SW perf events when HW PMU is not available
> mm/slab, slub_kunit: register kprobe to trigger _nolock APIs
> mm/slab: handle the !allow_spin case in kfree_rcu_sheaf()
> mm/slab: use call_rcu() in unknown context if irqs are enabled
> mm/slab: extend deferred free mechanism to handle rcu sheaves
> mm/slab: allow kfree_rcu_sheaf() on PREEMPT_RT
> mm/slab: introduce kfree_rcu_nolock()
> mm/slab: introduce struct kfree_rcu_head and use in kfree_rcu_nolock()
> slub_kunit: extend the test for kfree_rcu_nolock()
>
> include/linux/rcupdate.h | 12 +++
> include/linux/types.h | 4 +
> lib/tests/slub_kunit.c | 174 ++++++++++++++++++++++++++++------
> mm/slab.h | 5 +-
> mm/slab_common.c | 38 ++++++--
> mm/slub.c | 242 ++++++++++++++++++++++++++++++++++-------------
> 6 files changed, 370 insertions(+), 105 deletions(-)
> ---
> base-commit: c425609d6ac4012c8bbf01ec2e10e801b1923a7b
> change-id: 20260615-kfree_rcu_nolock-e5502555992f
>
> Best regards,

--
Cheers,
Harry / Hyeonggon

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature