Re: [PATCH RFC v2 00/10] SLUB percpu sheaves

From: Suren Baghdasaryan
Date: Sat Feb 22 2025 - 23:44:48 EST

Next message: Adam Simonelli: "Re: [PATCH v2 2/2] tty: Change order of ttynull to be loaded sooner."
Previous message: Suren Baghdasaryan: "Re: [PATCH RFC v2 10/10] maple_tree: use percpu sheaves for maple_node_cache"
In reply to: Kent Overstreet: "Re: [PATCH RFC v2 00/10] SLUB percpu sheaves"
Next in thread: Suren Baghdasaryan: "Re: [PATCH RFC v2 00/10] SLUB percpu sheaves"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, Feb 22, 2025 at 4:19 PM Kent Overstreet
<kent.overstreet@xxxxxxxxx> wrote:
>
> On Fri, Feb 14, 2025 at 05:27:36PM +0100, Vlastimil Babka wrote:
> > - Cheaper fast paths. For allocations, instead of local double cmpxchg,
> > after Patch 5 it's preempt_disable() and no atomic operations. Same for
> > freeing, which is normally a local double cmpxchg only for a short
> > term allocations (so the same slab is still active on the same cpu when
> > freeing the object) and a more costly locked double cmpxchg otherwise.
> > The downside is the lack of NUMA locality guarantees for the allocated
> > objects.
>
> Is that really cheaper than a local non locked double cmpxchg?

Don't know about this particular part but testing sheaves with maple
node cache and stress testing mmap/munmap syscalls shows performance
benefits as long as there is some delay to let kfree_rcu() do its job.
I'm still gathering results and will most likely post them tomorrow.

>
> Especially if you now have to use pushf/popf...
>
> > - kfree_rcu() batching and recycling. kfree_rcu() will put objects to a
> > separate percpu sheaf and only submit the whole sheaf to call_rcu()
> > when full. After the grace period, the sheaf can be used for
> > allocations, which is more efficient than freeing and reallocating
> > individual slab objects (even with the batching done by kfree_rcu()
> > implementation itself). In case only some cpus are allowed to handle rcu
> > callbacks, the sheaf can still be made available to other cpus on the
> > same node via the shared barn. The maple_node cache uses kfree_rcu() and
> > thus can benefit from this.
>
> Have you looked at fs/bcachefs/rcu_pending.c?

Next message: Adam Simonelli: "Re: [PATCH v2 2/2] tty: Change order of ttynull to be loaded sooner."
Previous message: Suren Baghdasaryan: "Re: [PATCH RFC v2 10/10] maple_tree: use percpu sheaves for maple_node_cache"
In reply to: Kent Overstreet: "Re: [PATCH RFC v2 00/10] SLUB percpu sheaves"
Next in thread: Suren Baghdasaryan: "Re: [PATCH RFC v2 00/10] SLUB percpu sheaves"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]