Re: [PATCH] mm/slub: skip freelist construction for whole-slab bulk refill

From: hu.shengming

Date: Mon Mar 30 2026 - 09:43:58 EST


> On 3/28/26 05:55, hu.shengming@xxxxxxxxxx wrote:
> > From: Shengming Hu <hu.shengming@xxxxxxxxxx>
> >
> > refill_objects() still carries a long-standing note that a whole-slab
> > bulk refill could avoid building a freelist that is immediately drained.
>
> "still" and "long-standing", huh :) it was added in 7.0-rc1
> but nevermind, good to address it anyway
>

Hi Vlastimil,
Haha, fair point — that wording was indeed a bit inaccurate ;-)

> > When the remaining bulk allocation is large enough to fully consume a
> > new slab, constructing the freelist is unnecessary overhead. Instead,
> > allocate the slab without building its freelist and hand out all objects
> > directly to the caller. The slab is then initialized as fully in-use.
> >
> > Keep the existing behavior when CONFIG_SLAB_FREELIST_RANDOM is enabled,
> > as freelist construction is required to provide randomized object order.
>
> That's a good point and we should not jeopardize the randomization.
> However, virtually all distro kernels enable it [1] so the benefits of this
> patch would not apply to them.
>
> But I think with some refactoring, it should be possible to reuse the
> relevant code (i.e. next_freelist_entry()) to store the object pointers into
> the bulk alloc array (i.e. sheaf) in the randomized order, without building
> it as a freelist? So I'd suggest trying that and measuring the result.
>
> [1]
> https://oracle.github.io/kconfigs/?config=UTS_RELEASE&config=SLAB_FREELIST_RANDOM
>

Thanks a lot for the great suggestion!

Agreed entirely — maintaining CONFIG_SLAB_FREELIST_RANDOM behavior is non-negotiable,
and the optimization’s practical value would be quite limited for most distro kernels
if it only works with the option disabled.

I’ll implement the revised approach, measure the performance impact comprehensively,
and append the results to the v2 patch.

> > Additionally, mark setup_object() as inline. After introducing this
> > optimization, the compiler no longer consistently inlines this helper,
> > which can regress performance in this hot path. Explicitly marking it
> > inline restores the expected code generation.
> >
> > This reduces per-object overhead in bulk allocation paths and improves
> > allocation throughput significantly.
> >
> > Benchmark results (slub_bulk_bench):
> >
> > Machine: qemu-system-x86_64 -m 1024M -smp 8
> > Kernel: Linux 7.0.0-rc5-next-20260326
> > Config: x86_64_defconfig
> > Rounds: 20
> > Total: 256MB
> >
> > obj_size=16, batch=256:
> > before: 28.80 ± 1.20 ns/object
> > after: 17.95 ± 0.94 ns/object
> > delta: -37.7%
> >
> > obj_size=32, batch=128:
> > before: 33.00 ± 0.00 ns/object
> > after: 21.75 ± 0.44 ns/object
> > delta: -34.1%
> >
> > obj_size=64, batch=64:
> > before: 44.30 ± 0.73 ns/object
> > after: 30.60 ± 0.50 ns/object
> > delta: -30.9%
> >
> > obj_size=128, batch=32:
> > before: 81.40 ± 1.85 ns/object
> > after: 47.00 ± 0.00 ns/object
> > delta: -42.3%
> >
> > obj_size=256, batch=32:
> > before: 101.20 ± 1.28 ns/object
> > after: 52.55 ± 0.60 ns/object
> > delta: -48.1%
> >
> > obj_size=512, batch=32:
> > before: 109.40 ± 2.30 ns/object
> > after: 53.80 ± 0.62 ns/object
> > delta: -50.8%
>
> That's encouraging!
> Thanks,
> Vlastimil

Thanks!
--
With Best Regards,
Shengming