Re: [PATCH bpf-next 3/6] bpf, net, frags: Add bpf_ip_check_defrag() kfunc

From: Daniel Borkmann
Date: Thu Dec 15 2022 - 17:32:20 EST


Hi Daniel,

Thanks for working on this!

On 12/15/22 12:25 AM, Daniel Xu wrote:
[...]
+#include <linux/bpf.h>
+#include <linux/btf_ids.h>
+#include <linux/ip.h>
+#include <linux/filter.h>
+#include <linux/netdevice.h>
+#include <net/ip.h>
+#include <net/sock.h>
+
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+ "Global functions as their definitions will be in ip_fragment BTF");
+
+/* bpf_ip_check_defrag - Defragment an ipv4 packet
+ *
+ * This helper takes an skb as input. If this skb successfully reassembles
+ * the original packet, the skb is updated to contain the original, reassembled
+ * packet.
+ *
+ * Otherwise (on error or incomplete reassembly), the input skb remains
+ * unmodified.
+ *
+ * Parameters:
+ * @ctx - Pointer to program context (skb)
+ * @netns - Child network namespace id. If value is a negative signed
+ * 32-bit integer, the netns of the device in the skb is used.
+ *
+ * Return:
+ * 0 on successfully reassembly or non-fragmented packet. Negative value on
+ * error or incomplete reassembly.
+ */
+int bpf_ip_check_defrag(struct __sk_buff *ctx, u64 netns)

small nit: for sk lookup helper we've used u32 netns_id, would be nice to have
this consistent here as well.

+{
+ struct sk_buff *skb = (struct sk_buff *)ctx;
+ struct sk_buff *skb_cpy, *skb_out;
+ struct net *caller_net;
+ struct net *net;
+ int mac_len;
+ void *mac;
+
+ if (unlikely(!((s32)netns < 0 || netns <= S32_MAX)))
+ return -EINVAL;
+
+ caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk);
+ if ((s32)netns < 0) {
+ net = caller_net;
+ } else {
+ net = get_net_ns_by_id(caller_net, netns);
+ if (unlikely(!net))
+ return -EINVAL;
+ }
+
+ mac_len = skb->mac_len;
+ skb_cpy = skb_copy(skb, GFP_ATOMIC);
+ if (!skb_cpy)
+ return -ENOMEM;

Given slow path, this idea is expensive but okay. Maybe in future it could be lifted
which might be a bigger lift to teach verifier that input ctx cannot be accessed
anymore.. but then frags are very much discouraged either way and bpf_ip_check_defrag()
might only apply in corner case situations (like DNS, etc).

+ skb_out = ip_check_defrag(net, skb_cpy, IP_DEFRAG_BPF);
+ if (IS_ERR(skb_out))
+ return PTR_ERR(skb_out);

Looks like ip_check_defrag() can gracefully handle IPv6 packet. It will just return back
skb_cpy pointer in that case. However, this brings me to my main complaint.. I don't
think we should merge anything IPv4-related without also having IPv6 equivalent support,
otherwise we're building up tech debt, so pls also add support for the latter.

+ skb_morph(skb, skb_out);
+ kfree_skb(skb_out);
+
+ /* ip_check_defrag() does not maintain mac header, so push empty header
+ * in so prog sees the correct layout. The empty mac header will be
+ * later pulled from cls_bpf.
+ */
+ mac = skb_push(skb, mac_len);
+ memset(mac, 0, mac_len);
+ bpf_compute_data_pointers(skb);
+
+ return 0;
+}
+

Thanks,
Daniel