Re: [PATCH v2] net: skbuff: set FLAG_SKB_NO_MERGE for skbuff_fclone_cache

From: Eric Dumazet
Date: Tue Feb 27 2024 - 07:55:31 EST


On Tue, Feb 27, 2024 at 7:29 AM Huang Shijie
<shijie@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Since we do not set FLAG_SKB_NO_MERGE for skbuff_fclone_cache,
> the current skbuff_fclone_cache maybe not really allocated, it maybe
> used an exist old kmem_cache. In NUMA, the fclone allocated by
> alloc_skb_fclone() maybe in remote node.

Why is this happening in the first place ? Whab about skb->head ?

Jesper patch [1] motivation was not about NUMA., but about
fragmentation and bulk allocations/freeing.

TCP fclones are not bulk allocated/freed, so I do not understand what
your patch is doing.
You need to give more details, and experimental results.

Using SLAB_NO_MERGE does not help, I am still seeing wrong allocations
on a dual socket
host with plenty of available memory.
(either sk_buff or skb->head being allocated on the other node).

fclones might be allocated from a cpu running on node A, and freed
from a cpu running on node B.
Maybe SLUB is not properly handling this case ?

SLAB_NO_MERGE will avoid merging fclone with kmalloc-512, it does not
really help.

I think we need help from mm/slub experts, instead of trying to 'fix'
networking stacks.

Perhaps we could augment trace_kmem_cache_alloc() to record/print the nodes
of the allocated chunk (we already have the cpu number giving us the
local node).
That would give us more confidence on any fixes.

BTW SLUB is gone, time to remove FLAG_SKB_NO_MERGE and simply use SLAB_NO_MERGE

[1]
commit 0a0643164da4a1976455aa12f0a96d08ee290752
Author: Jesper Dangaard Brouer <hawk@xxxxxxxxxx>
Date: Tue Aug 15 17:17:36 2023 +0200

net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache



>
> So set FLAG_SKB_NO_MERGE for skbuff_fclone_cache to fix it.
>
> Signed-off-by: Huang Shijie <shijie@xxxxxxxxxxxxxxxxxxxxxx>
> ---
> v1 --> v2:
> set FLAG_SKB_NO_MERGE for skbuff_fclone_cache in initialization.
>
> v1: https://lkml.org/lkml/2024/2/20/121
> ---
> net/core/skbuff.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 1f918e602bc4..5e3e130fb57a 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5013,7 +5013,8 @@ void __init skb_init(void)
> skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache",
> sizeof(struct sk_buff_fclones),
> 0,
> - SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> + SLAB_HWCACHE_ALIGN|SLAB_PANIC|
> + FLAG_SKB_NO_MERGE,
> NULL);
> /* usercopy should only access first SKB_SMALL_HEAD_HEADROOM bytes.
> * struct skb_shared_info is located at the end of skb->head,
> --
> 2.40.1
>