Re: [RFC PATCH 1/1] bpf: Add tunnel decapsulation and GSO state updates per new flags

From: Willem de Bruijn

Date: Tue Mar 10 2026 - 15:44:00 EST


Hudson, Nick wrote:
>
>
> > On 25 Feb 2026, at 15:45, Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote:
> >
> > !-------------------------------------------------------------------|
> > This Message Is From an External Sender
> > This message came from outside your organization.
> > |-------------------------------------------------------------------!
> >
> > Hudson, Nick wrote:
> >>
> >>
> >>> On 20 Feb 2026, at 21:08, Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote:
> >>>
> >>> !-------------------------------------------------------------------|
> >>> This Message Is From an External Sender
> >>> This message came from outside your organization.
> >>> |-------------------------------------------------------------------!
> >>>
> >>> Nick Hudson wrote:
> >>>> Enable BPF programs to properly handle GSO state when decapsulating
> >>>> tunneled packets by adding selective GSO flag clearing and a trusted
> >>>> mode for GSO handling.
> >>>>
> >>>> New decapsulation flags:
> >>>>
> >>>> - BPF_F_ADJ_ROOM_DECAP_L4_UDP: Clear UDP tunnel GSO flags
> >>>> (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM)
> >>>> - BPF_F_ADJ_ROOM_DECAP_L4_GRE: Clear GRE tunnel GSO flags
> >>>> (SKB_GSO_GRE, SKB_GSO_GRE_CSUM)
> >>>> - BPF_F_ADJ_ROOM_DECAP_IPXIP4: Clear SKB_GSO_IPXIP4 flag for
> >>>> IPv4-in-IPv4 (IPIP) and IPv6-in-IPv4 (SIT) tunnels
> >>>> - BPF_F_ADJ_ROOM_DECAP_IPXIP6: Clear SKB_GSO_IPXIP6 flag for
> >>>> IPv6-in-IPv6 and IPv4-in-IPv6 tunnels
> >>>> - BPF_F_ADJ_ROOM_NO_DODGY: Preserve gso_segs and don't set
> >>>> SKB_GSO_DODGY when the BPF program is trusted and modifications
> >>>> are known to be valid
> >>>>
> >>>> The existing anonymous enum for BPF_FUNC_skb_adjust_room flags is
> >>>> renamed to enum bpf_adj_room_flags to enable CO-RE (Compile Once -
> >>>> Run Everywhere) lookups in BPF programs.
> >>>>
> >>>> By default, bpf_skb_adjust_room sets SKB_GSO_DODGY and resets
> >>>> gso_segs to 0, forcing revalidation. The NO_DODGY flag bypasses this
> >>>> for trusted programs that guarantee GSO correctness.
> >>>>
> >>>> Usage example (decapsulating UDP tunnel with IPv4 inner packet):
> >>>> bpf_skb_adjust_room(skb, -hdr_len, BPF_ADJ_ROOM_NET,
> >>>> BPF_F_ADJ_ROOM_DECAP_L3_IPV4 |
> >>>> BPF_F_ADJ_ROOM_DECAP_L4_UDP);
> >>>
> >>> This patch is doing to much in one patch.
> >>
> >> Sure, I’ll split it up.
> >>
> >>>
> >>> Also not convinced of the need for the NO_DODGY flag.
> >>
> >> The reason for NO_DODGY is that, without it, the egress interface will see the
> >> SKB_GSO_DODGY flag. In our use case, we want to avoid marking the egress tap as
> >> NETIF_F_GSO_ROBUST, so the skb will fail skb_gso_ok() with SKB_GSO_DODGY set.
> >> When skb_gso_ok() fails, validate_xmit_skb() calls skb_gso_segment().
> >
> > I understand why you might want it. But the dodgy check has long been
> > there for a reason: becauses these transformations are not blindly
> > accepted by the kernel. This use case does not change that.
>
> The defence I came up with here is...
>
> - setting NETIF_F_GSO_ROBUST for the tun/tap device, as it is a device level property, affects both host to guest and guest to host. the former is trusted. the latter is not. therefore this is not an option.
> - the host to guest direction is fully trusted
> - Physical NIC driver is trusted (kernel driver, hardware-validated GSO)
> - BPF program is trusted (privileged, CAP_BPF, verified by kernel)
> - Decapsulation is trusted operation for BPF code authors
> - Bridge + TAP is internal kernel forwarding
>
> Would protecting its use with a sysctl make it acceptable? (If it isn’t still)

Is the DODGY path and going through GSO a significant impact to your
workload?

So far we have always declined to add such custom opt-outs. This is
not at all the first affected user case.

Either way, let's separate this from the main functional decap patch.