Re: [PATCH RFC v2 00/10] SLUB percpu sheaves
From: Kent Overstreet
Date: Sat Feb 22 2025 - 19:19:55 EST
On Fri, Feb 14, 2025 at 05:27:36PM +0100, Vlastimil Babka wrote:
> - Cheaper fast paths. For allocations, instead of local double cmpxchg,
> after Patch 5 it's preempt_disable() and no atomic operations. Same for
> freeing, which is normally a local double cmpxchg only for a short
> term allocations (so the same slab is still active on the same cpu when
> freeing the object) and a more costly locked double cmpxchg otherwise.
> The downside is the lack of NUMA locality guarantees for the allocated
> objects.
Is that really cheaper than a local non locked double cmpxchg?
Especially if you now have to use pushf/popf...
> - kfree_rcu() batching and recycling. kfree_rcu() will put objects to a
> separate percpu sheaf and only submit the whole sheaf to call_rcu()
> when full. After the grace period, the sheaf can be used for
> allocations, which is more efficient than freeing and reallocating
> individual slab objects (even with the batching done by kfree_rcu()
> implementation itself). In case only some cpus are allowed to handle rcu
> callbacks, the sheaf can still be made available to other cpus on the
> same node via the shared barn. The maple_node cache uses kfree_rcu() and
> thus can benefit from this.
Have you looked at fs/bcachefs/rcu_pending.c?