Re: [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb
From: Willem de Bruijn
Date: Tue Apr 23 2024 - 09:55:36 EST
Richard Gobert wrote:
> Willem de Bruijn wrote:
> > Richard Gobert wrote:
> >> Willem de Bruijn wrote:
> >>> Richard Gobert wrote:
> >>>> This patch adds network_offset and inner_network_offset to napi_gro_cb, and
> >>>> makes sure both are set correctly. In the common path there's only one
> >>>> write (skb_gro_reset_offset, which replaces skb_set_network_header).
> >>>>
> >>>> Signed-off-by: Richard Gobert <richardbgobert@xxxxxxxxx>
> >>>> ---
> >>>> drivers/net/geneve.c | 1 +
> >>>> drivers/net/vxlan/vxlan_core.c | 1 +
> >>>> include/net/gro.h | 18 ++++++++++++++++--
> >>>> net/8021q/vlan_core.c | 2 ++
> >>>> net/core/gro.c | 1 +
> >>>> net/ethernet/eth.c | 1 +
> >>>> net/ipv4/af_inet.c | 5 +----
> >>>> net/ipv4/gre_offload.c | 1 +
> >>>> net/ipv6/ip6_offload.c | 8 ++++----
> >>>> 9 files changed, 28 insertions(+), 10 deletions(-)
> >>>>
> >>>
> >>>> +static inline int skb_gro_network_offset(const struct sk_buff *skb)
> >>>> +{
> >>>> + return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark];
> >>>> +}
> >>>> +
> >>>
> >>>
> >>>> @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
> >>>> if (unlikely(!iph))
> >>>> goto out;
> >>>>
> >>>> - skb_set_network_header(skb, off);
> >>>> -
> >>>
> >>> Especially for net, this is still a large patch.
> >>>
> >>> Can we avoid touching all those tunnel callbacks and just set the
> >>> offsets in inet_gro_receive and ipv6_gro_receive themselves, just
> >>> as skb_set_network_header now:
> >>>
> >>> @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
> >>> if (unlikely(!iph))
> >>> goto out;
> >>>
> >>> - skb_set_network_header(skb, off);
> >>> + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;
> >>>
> >>
> >> Thanks for the reply!
> >>
> >> Setting network_offset on dev_gro_receive and inner_network_offset only
> >> in the tunnel callbacks is the best option IMO. I agree that
> >> we want a small patch to net that solves the problem, although I
> >> think always using ->encap_mark in the common path is not ideal.
> >>
> >> We can avoid changing all the tunnel callbacks by always setting
> >> inner_network_offset in {ipv6,inet}_gro_receive and initialize
> >> network_offset to 0 in dev_gro_receive. It will result in a small
> >> change, without using ->encap_mark.
> >>
> >> What are your thoughts?
> >
> > That works. It's a bit ugly that inner_network_offset will always be
> > set, even if a packet only traverses inet_gro_receive once. What is
> > your concern with testing encap_mark?
> >
> > How do you want to detect in udp[46]_lib_lookup_skb which of the two
> > offsets to use? That would still be encap_mark based?
> >
>
> I'd like to minimize any potential overhead, even a small one, and this way
> we do not need to access encap_mark at all in the common path.
>
> NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;
>
> compiles to:
>
> movzx eax, byte ptr [rbx+46h]
> shr al, 1
> and eax, 1
> mov [rbx+rax*2+4Ch], r14w
>
> while
>
> NAPI_GRO_CB(skb)->inner_network_offset = off;
>
> compiles to:
>
> mov [rbx+4Eh], r14w
>
> I do plan to add a patch to net-next after this to remove the access
> entirely from inet gro callbacks, in the meantime, it looks to me like a
> reasonable patch and small enough to not raise concerns.
>
> For udp_lib_lookup I don't see a way around it so yes, it would still be
> dependent on encap_mark. Since this runs in the complete phase it's less
> concerning.
>
> Let me know that you're ok with that and I'll post a v3.
Yes, looks fine.
Main cost is memory access, and that encap_mark will be
accessed soon after in udp4_lib_lookup.
I don't expect two arithmetic instructions to matter. But this code
does now have one more store: the one in dev_gro_receive.
Either way, in the noise. Both approaches look fine to me: very
concise and essentially equivalent. Choose your preferred option.