Re: [PATCH v2 3/3] mm/slub: simplify get_partial_node()

From: Vlastimil Babka
Date: Thu Apr 04 2024 - 05:27:02 EST


On 4/4/24 7:58 AM, xiongwei.song@xxxxxxxxxxxxx wrote:
> From: Xiongwei Song <xiongwei.song@xxxxxxxxxxxxx>
>
> The break conditions for filling cpu partial can be more readable and
> simple.
>
> If slub_get_cpu_partial() returns 0, we can confirm that we don't need
> to fill cpu partial, then we should break from the loop. On the other
> hand, we also should break from the loop if we have added enough cpu
> partial slabs.
>
> Meanwhile, the logic above gets rid of the #ifdef and also fixes a weird
> corner case that if we set cpu_partial_slabs to 0 from sysfs, we still
> allocate at least one here.
>
> Signed-off-by: Xiongwei Song <xiongwei.song@xxxxxxxxxxxxx>
> ---
>
> The measurement below is to compare the performance effects when checking
> if we need to break from the filling cpu partial loop with the following
> either-or condition:
>
> Condition 1:
> When the count of added cpu slabs is greater than cpu_partial_slabs/2:
> (partial_slabs > slub_get_cpu_partial(s) / 2)
>
> Condition 2:
> When the count of added cpu slabs is greater than or equal to
> cpu_partial_slabs/2:
> (partial_slabs >= slub_get_cpu_partial(s) / 2)
>
> The change of breaking condition can effect how many cpu partial slabs
> would be put on the cpu partial list.
>
> Run the test with a "Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz" cpu with
> 16 cores. The OS is Ubuntu 22.04.
>
> hackbench-process-pipes
> 6.9-rc2(with ">") 6.9.0-rc2(with ">=")
> Amean 1 0.0373 ( 0.00%) 0.0356 * 4.60%*
> Amean 4 0.0984 ( 0.00%) 0.1014 * -3.05%*
> Amean 7 0.1803 ( 0.00%) 0.1851 * -2.69%*
> Amean 12 0.2947 ( 0.00%) 0.3141 * -6.59%*
> Amean 21 0.4577 ( 0.00%) 0.4927 * -7.65%*
> Amean 30 0.6326 ( 0.00%) 0.6649 * -5.10%*
> Amean 48 0.9396 ( 0.00%) 0.9884 * -5.20%*
> Amean 64 1.2321 ( 0.00%) 1.3004 * -5.54%*
>
> hackbench-process-sockets
> 6.9-rc2(with ">") 6.9.0-rc2(with ">=")
> Amean 1 0.0609 ( 0.00%) 0.0623 * -2.35%*
> Amean 4 0.2107 ( 0.00%) 0.2140 * -1.56%*
> Amean 7 0.3754 ( 0.00%) 0.3966 * -5.63%*
> Amean 12 0.6456 ( 0.00%) 0.6734 * -4.32%*
> Amean 21 1.1440 ( 0.00%) 1.1769 * -2.87%*
> Amean 30 1.6629 ( 0.00%) 1.7031 * -2.42%*
> Amean 48 2.7321 ( 0.00%) 2.7897 * -2.11%*
> Amean 64 3.7397 ( 0.00%) 3.7640 * -0.65%*
>
> It seems there is a bit performance penalty when using ">=" to break up
> the loop. Hence, we should still use ">" here.

Thanks for evaluating that, I suspected that would be the case so we should
not change that performance aspect as part of a cleanup.

> ---
> mm/slub.c | 9 +++------
> 1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 590cc953895d..6beff3b1e22c 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2619,13 +2619,10 @@ static struct slab *get_partial_node(struct kmem_cache *s,
> stat(s, CPU_PARTIAL_NODE);
> partial_slabs++;
> }
> -#ifdef CONFIG_SLUB_CPU_PARTIAL
> - if (partial_slabs > s->cpu_partial_slabs / 2)
> - break;
> -#else
> - break;
> -#endif
>
> + if ((slub_get_cpu_partial(s) == 0) ||
> + (partial_slabs > slub_get_cpu_partial(s) / 2))
> + break;
> }
> spin_unlock_irqrestore(&n->list_lock, flags);
> return partial;

After looking at the result and your v1 again, I arrived at this
modification that incorporates the core v1 idea without reintroducing
kmem_cache_has_cpu_partial(). The modified patch looks like below. Is it OK
with you? Pushed the whole series with this modification to slab/for-next
for now.

----8<-----
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2614,18 +2614,17 @@ static struct slab *get_partial_node(struct kmem_cache *s,
if (!partial) {
partial = slab;
stat(s, ALLOC_FROM_PARTIAL);
+ if ((slub_get_cpu_partial(s) == 0)) {
+ break;
+ }
} else {
put_cpu_partial(s, slab, 0);
stat(s, CPU_PARTIAL_NODE);
- partial_slabs++;
- }
-#ifdef CONFIG_SLUB_CPU_PARTIAL
- if (partial_slabs > s->cpu_partial_slabs / 2)
- break;
-#else
- break;
-#endif

+ if (++partial_slabs > slub_get_cpu_partial(s) / 2) {
+ break;
+ }
+ }
}
spin_unlock_irqrestore(&n->list_lock, flags);
return partial;