RE: [PATCH 3/4] mm/slub: simplify get_partial_node()
From: Song, Xiongwei
Date: Tue Apr 02 2024 - 20:38:27 EST
>
> On 3/31/24 4:19 AM, xiongwei.song@xxxxxxxxxxxxx wrote:
> > From: Xiongwei Song <xiongwei.song@xxxxxxxxxxxxx>
> >
> > The break conditions can be more readable and simple.
> >
> > We can check if we need to fill cpu partial after getting the first
> > partial slab. If kmem_cache_has_cpu_partial() returns true, we fill
> > cpu partial from next iteration, or break up the loop.
> >
> > Then we can remove the preprocessor condition of
> > CONFIG_SLUB_CPU_PARTIAL. Use dummy slub_get_cpu_partial() to make
> > compiler silent.
> >
> > Signed-off-by: Xiongwei Song <xiongwei.song@xxxxxxxxxxxxx>
> > ---
> > mm/slub.c | 22 ++++++++++++----------
> > 1 file changed, 12 insertions(+), 10 deletions(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 590cc953895d..ec91c7435d4e 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2614,18 +2614,20 @@ static struct slab *get_partial_node(struct kmem_cache *s,
> > if (!partial) {
> > partial = slab;
> > stat(s, ALLOC_FROM_PARTIAL);
> > - } else {
> > - put_cpu_partial(s, slab, 0);
> > - stat(s, CPU_PARTIAL_NODE);
> > - partial_slabs++;
> > +
> > + /* Fill cpu partial if needed from next iteration, or break */
> > + if (kmem_cache_has_cpu_partial(s))
>
> That kinda puts back the check removed in patch 1, although only in the
> first iteration. Still not ideal.
>
> > + continue;
> > + else
> > + break;
> > }
> > -#ifdef CONFIG_SLUB_CPU_PARTIAL
> > - if (partial_slabs > s->cpu_partial_slabs / 2)
> > - break;
> > -#else
> > - break;
> > -#endif
>
> I'd suggest intead of the changes done in this patch, only change this part
> above to:
>
> if ((slub_get_cpu_partial(s) == 0) ||
> (partial_slabs > slub_get_cpu_partial(s) / 2))
> break;
>
> That gets rid of the #ifdef and also fixes a weird corner case that if we
> set cpu_partial_slabs to 0 from sysfs, we still allocate at least one here.
Oh, yes. Will update.
>
> It could be tempting to use >= instead of > to achieve the same effect but
> that would have unintended performance effects that would best be evaluated
> separately.
I can run a test to measure Amean changes. But in terms of x86 assembly, there
should not be extra instructions with ">=".
Did a simple test, for ">=" it uses "jle" instruction, while "jl" instruction is used for ">".
No more instructions involved. So there should not be performance effects on x86.
Thanks,
Xiongwei
>
> >
> > + put_cpu_partial(s, slab, 0);
> > + stat(s, CPU_PARTIAL_NODE);
> > + partial_slabs++;
> > +
> > + if (partial_slabs > slub_get_cpu_partial(s) / 2)
> > + break;
> > }
> > spin_unlock_irqrestore(&n->list_lock, flags);
> > return partial;