Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload

From: Steffen Klassert
Date: Tue Nov 26 2024 - 07:59:43 EST


On Tue, Nov 26, 2024 at 10:35:13AM +0200, Leon Romanovsky wrote:
> On Tue, Nov 26, 2024 at 09:09:03AM +0200, Ilia Lin wrote:
> > On Mon, Nov 25, 2024 at 9:43 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Nov 25, 2024 at 11:26:14AM +0200, Ilia Lin wrote:
> > > > On Sun, Nov 24, 2024 at 2:04 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> > > > > > In packet offload mode the raw packets will be sent to the NiC,
> > > > > > and will not return to the Network Stack. In event of crossing
> > > > > > the MTU size after the encapsulation, the NiC HW may not be
> > > > > > able to fragment the final packet.
> > > > >
> > > > > Yes, HW doesn't know how to handle these packets.
> > > > >
> > > > > > Adding mandatory pre-encapsulation fragmentation for both
> > > > > > IPv4 and IPv6, if tunnel mode with packet offload is configured
> > > > > > on the state.
> > > > >
> > > > > I was under impression is that xfrm_dev_offload_ok() is responsible to
> > > > > prevent fragmentation.
> > > > >
> > https://elixir.bootlin.com/linux/v6.12/source/net/xfrm/xfrm_device.c#L410
> > > >
> > > > With my change we can both support inner fragmentation or prevent it,
> > > > depending on the network device driver implementation.
> > >
> > > The thing is that fragmentation isn't desirable thing. Why didn't PMTU
> > > take into account headers so we can rely on existing code and do not add
> > > extra logic for packet offload?
> >
> > I agree that PMTU is preferred option, but the packets may be routed from
> > a host behind the VPN, which is unaware that it transmits into an IPsec
> > tunnel,
> > and therefore will not count on the extra headers.
>
> My basic web search shows that PMTU works correctly for IPsec tunnels too.

Yes, at least SW and crypto offload IPsec PMTU works correctly.

>
> Steffen, do we need special case for packet offload here? My preference is
> to make sure that we will have as less possible special cases for packet
> offload.

Looks like the problem on packet offload is that packets
bigger than MTU size are dropped before the PMTU signaling
is handled.