[PATCH net] openvswitch: Set the skbuff pkt_type for proper pmtud support.

From: Aaron Conole
Date: Fri Mar 22 2024 - 15:06:23 EST


Open vSwitch is originally intended to switch at layer 2, only dealing with
Ethernet frames. With the introduction of l3 tunnels support, it crossed
into the realm of needing to care a bit about some routing details when
making forwarding decisions. If an oversized packet would need to be
fragmented during this forwarding decision, there is a chance for pmtu
to get involved and generate a routing exception. This is gated by the
skbuff->pkt_type field.

When a flow is already loaded into the openvswitch module this field is
set up and transitioned properly as a packet moves from one port to
another. In the case that a packet execute is invoked after a flow is
newly installed this field is not properly initialized. This causes the
pmtud mechanism to omit sending the required exception messages across
the tunnel boundary and a second attempt needs to be made to make sure
that the routing exception is properly setup. To fix this, we set the
outgoing packet's pkt_type to PACKET_OUTGOING, since it can only get
to the openvswitch module via a port device or packet command.

This issue is periodically encountered in complex setups, such as large
openshift deployments, where multiple sets of tunnel traversal occurs.
A way to recreate this is with the ovn-heater project that can setup
a networking environment which mimics such large deployments. In that
environment, without this patch, we can see:

./ovn_cluster.sh start
podman exec ovn-chassis-1 ip r a 170.168.0.5/32 dev eth1 mtu 1200
podman exec ovn-chassis-1 ip netns exec sw01p1 ip r flush cache
podman exec ovn-chassis-1 ip netns exec sw01p1 ping 21.0.0.3 -M do -s 1300 -c2
PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
From 21.0.0.3 icmp_seq=2 Frag needed and DF set (mtu = 1142)

--- 21.0.0.3 ping statistics ---
2 packets transmitted, 0 received, +1 errors, 100% packet loss, time 1017ms

Using tcpdump, we can also see the expected ICMP FRAG_NEEDED message is not
sent into the server.

With this patch, setting the pkt_type, we see the following:

podman exec ovn-chassis-1 ip netns exec sw01p1 ping 21.0.0.3 -M do -s 1300 -c2
PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
From 21.0.0.3 icmp_seq=1 Frag needed and DF set (mtu = 1222)
ping: local error: message too long, mtu=1222

--- 21.0.0.3 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1061ms

In this case, the first ping request receives the FRAG_NEEDED message and
a local routing exception is created.

Reported-at: https://issues.redhat.com/browse/FDP-164
Fixes: 58264848a5a7 ("openvswitch: Add vxlan tunneling support.")
Signed-off-by: Aaron Conole <aconole@xxxxxxxxxx>
---
NOTE: An alternate approach would be to add a netlink attribute to preserve
pkt_type across the kernel->user boundary, but that does require some
userspace cooperation.

net/openvswitch/actions.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 6fcd7e2ca81fe..952c6292100d0 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -936,6 +936,8 @@ static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port,
pskb_trim(skb, ovs_mac_header_len(key));
}

+ skb->pkt_type = PACKET_OUTGOING;
+
if (likely(!mru ||
(skb->len <= mru + vport->dev->hard_header_len))) {
ovs_vport_send(vport, skb, ovs_key_mac_proto(key));
--
2.41.0