Re: [PATCH net v1 2/2] net: gro: add p_off param in *_gro_complete

From: Willem de Bruijn
Date: Sat Apr 13 2024 - 14:46:17 EST


Richard Gobert wrote:
> Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp:
> additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the
> complete phase of gro. The functions always return skb->network_header,
> which in the case of encapsulated packets at the gro complete phase, is
> always set to the innermost L3 of the packet. That means that calling
> {ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in
> gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_
> L3/L4 may return an unexpected value.
>
> This incorrect usage leads to a bug in GRO's UDP socket lookup.
> udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These
> *_hdr functions return network_header which will point to the innermost L3,
> resulting in the wrong offset being used in __udp{4,6}_lib_lookup with
> encapsulated packets.
>
> To fix this issue p_off param is used in *_gro_complete to pass off the
> offset of the previous layer.
>
> Reproduction example:
>
> Endpoint configuration example (fou + local address bind)
>
> # ip fou add port 6666 ipproto 4
> # ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip
> # ip link set tun1 up
> # ip a add 1.1.1.2/24 dev tun1
>
> Netperf TCP_STREAM result on net-next before patch is applied:
>
> net-next main, GRO enabled:
> $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> 131072 16384 16384 5.28 2.37
>
> net-next main, GRO disabled:
> $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> 131072 16384 16384 5.01 2745.06
>
> patch applied, GRO enabled:
> $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> 131072 16384 16384 5.01 2877.38
>
> Fixes: 57c67ff4bd92 ("udp: additional GRO support")
> Suggested-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> Signed-off-by: Richard Gobert <richardbgobert@xxxxxxxxx>

> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
> index 163f94a5a58f..9c18a39b0d0c 100644
> --- a/drivers/net/geneve.c
> +++ b/drivers/net/geneve.c
> @@ -555,7 +555,7 @@ static struct sk_buff *geneve_gro_receive(struct sock *sk,
> }
>
> static int geneve_gro_complete(struct sock *sk, struct sk_buff *skb,
> - int nhoff)
> + int p_off, int nhoff)
> {
> struct genevehdr *gh;
> struct packet_offload *ptype;
> @@ -569,11 +569,12 @@ static int geneve_gro_complete(struct sock *sk, struct sk_buff *skb,
>
> /* since skb->encapsulation is set, eth_gro_complete() sets the inner mac header */
> if (likely(type == htons(ETH_P_TEB)))
> - return eth_gro_complete(skb, nhoff + gh_len);
> + return eth_gro_complete(skb, p_off, nhoff + gh_len);

Since the new field to the callback is only used between IP and
transport layer callback implementations, I think the others should
just return zero, to make it clear that the value is unused.

I still think that if the only issue is with udp, we can just special
case those and pass the nhoff instead of thoff in the existing one
available offset field, and compute the transport offset in the udp
function. For much less code churn. But unless anyone else agrees you
can ignore that suggestion.

> -int inet_gro_complete(struct sk_buff *skb, int nhoff)
> +int inet_gro_complete(struct sk_buff *skb, int prior_off, int nhoff)
> {
> struct iphdr *iph = (struct iphdr *)(skb->data + nhoff);
> const struct net_offload *ops;
> @@ -1667,17 +1667,17 @@ int inet_gro_complete(struct sk_buff *skb, int nhoff)
> */
> err = INDIRECT_CALL_2(ops->callbacks.gro_complete,
> tcp4_gro_complete, udp4_gro_complete,
> - skb, nhoff + sizeof(*iph));
> + skb, nhoff, nhoff + sizeof(*iph));

Identation change

> struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb,
> + int nhoff,
> __be16 sport, __be16 dport)
> {
> - const struct iphdr *iph = ip_hdr(skb);
> + const struct iphdr *iph = (const struct iphdr *)(skb->data + nhoff);

How about instead just pass the saddr and daddr and leave the iph
pointer to the caller (which also computes the udph pointer).