Re: [PATCH] net: skbuff: allocate the fclone in the current NUMA node

From: Eric Dumazet
Date: Sat Feb 24 2024 - 14:08:22 EST


On Tue, Feb 20, 2024 at 9:37 AM Shijie Huang
<shijie@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>
> 在 2024/2/20 16:17, Eric Dumazet 写道:
> > On Tue, Feb 20, 2024 at 7:26 AM Shijie Huang
> > <shijie@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >>
> >> 在 2024/2/20 13:32, Eric Dumazet 写道:
> >>> On Tue, Feb 20, 2024 at 3:18 AM Huang Shijie
> >>> <shijie@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >>>> The current code passes NUMA_NO_NODE to __alloc_skb(), we found
> >>>> it may creates fclone SKB in remote NUMA node.
> >>> This is intended (WAI)
> >> Okay. thanks a lot.
> >>
> >> It seems I should fix the issue in other code, not the networking.
> >>
> >>> What about the NUMA policies of the current thread ?
> >> We use "numactl -m 0" for memcached, the NUMA policy should allocate
> >> fclone in
> >>
> >> node 0, but we can see many fclones were allocated in node 1.
> >>
> >> We have enough memory to allocate these fclones in node 0.
> >>
> >>> Has NUMA_NO_NODE behavior changed recently?
> >> I guess not.
> >>> What means : "it may creates" ? Please be more specific.
> >> When we use the memcached for testing in NUMA, there are maybe 20% ~ 30%
> >> fclones were allocated in
> >>
> >> remote NUMA node.
> > Interesting, how was it measured exactly ?
>
> I created a private patch to record the status for each fclone allocation.
>
>
> > Are you using SLUB or SLAB ?
>
> I think I use SLUB. (CONFIG_SLUB=y,
> CONFIG_SLAB_MERGE_DEFAULT=y,CONFIG_SLUB_CPU_PARTIAL=y)
>

A similar issue comes from tx_action() calling __napi_kfree_skb() on
arbitrary skbs
including ones that were allocated on a different NUMA node.

This pollutes per-cpu caches with not optimally placed sk_buff :/

Although this should not impact fclones, __napi_kfree_skb() only ?

commit 15fad714be86eab13e7568fecaf475b2a9730d3e
Author: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
Date: Mon Feb 8 13:15:04 2016 +0100

net: bulk free SKBs that were delay free'ed due to IRQ context

What about :

diff --git a/net/core/dev.c b/net/core/dev.c
index c588808be77f563c429eb4a2eaee5c8062d99582..63165138c6f690e14520f11e32dc16f2845abad4
100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5162,11 +5162,7 @@ static __latent_entropy void
net_tx_action(struct softirq_action *h)
trace_kfree_skb(skb, net_tx_action,
get_kfree_skb_cb(skb)->reason);

- if (skb->fclone != SKB_FCLONE_UNAVAILABLE)
- __kfree_skb(skb);
- else
- __napi_kfree_skb(skb,
- get_kfree_skb_cb(skb)->reason);
+ __kfree_skb(skb);
}
}