Re: [PATCH bpf-next 3/6] bpf, net, frags: Add bpf_ip_check_defrag() kfunc

From: Daniel Xu
Date: Thu Dec 15 2022 - 18:58:45 EST

Next message: Damien Le Moal: "Re: [PATCH V10 1/8] block, bfq: split sync bfq_queues on a per-actuator basis"
Previous message: Serge Semin: "Re: linux-next: manual merge of the pci tree with the arm-soc tree"
In reply to: Daniel Borkmann: "Re: [PATCH bpf-next 3/6] bpf, net, frags: Add bpf_ip_check_defrag() kfunc"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Daniel,

Thanks for taking a look!

On Thu, Dec 15, 2022 at 11:31:52PM +0100, Daniel Borkmann wrote:
> Hi Daniel,
>
> Thanks for working on this!
>
> On 12/15/22 12:25 AM, Daniel Xu wrote:
> [...]
> > +#include <linux/bpf.h>
> > +#include <linux/btf_ids.h>
> > +#include <linux/ip.h>
> > +#include <linux/filter.h>
> > +#include <linux/netdevice.h>
> > +#include <net/ip.h>
> > +#include <net/sock.h>
> > +
> > +__diag_push();
> > +__diag_ignore_all("-Wmissing-prototypes",
> > + "Global functions as their definitions will be in ip_fragment BTF");
> > +
> > +/* bpf_ip_check_defrag - Defragment an ipv4 packet
> > + *
> > + * This helper takes an skb as input. If this skb successfully reassembles
> > + * the original packet, the skb is updated to contain the original, reassembled
> > + * packet.
> > + *
> > + * Otherwise (on error or incomplete reassembly), the input skb remains
> > + * unmodified.
> > + *
> > + * Parameters:
> > + * @ctx - Pointer to program context (skb)
> > + * @netns - Child network namespace id. If value is a negative signed
> > + * 32-bit integer, the netns of the device in the skb is used.
> > + *
> > + * Return:
> > + * 0 on successfully reassembly or non-fragmented packet. Negative value on
> > + * error or incomplete reassembly.
> > + */
> > +int bpf_ip_check_defrag(struct __sk_buff *ctx, u64 netns)
>
> small nit: for sk lookup helper we've used u32 netns_id, would be nice to have
> this consistent here as well.

Hmm, when I grep I see the sk lookup helpers using a u64 as well:

$ rg "u64 netns" ./include/uapi/linux/bpf.h
3394: * struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u64 netns, u64 flags)
3431: * struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u64 netns, u64 flags)
3629: * struct bpf_sock *bpf_skc_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u64 netns, u64 flags)
6238: __u64 netns_dev;
6239: __u64 netns_ino;
6274: __u64 netns_dev;
6275: __u64 netns_ino;
$ rg "u32 netns" ./include/uapi/linux/bpf.h
6335: __u32 netns_ino;

Am I looking at the right place?

>
> > +{
> > + struct sk_buff *skb = (struct sk_buff *)ctx;
> > + struct sk_buff *skb_cpy, *skb_out;
> > + struct net *caller_net;
> > + struct net *net;
> > + int mac_len;
> > + void *mac;
> > +
> > + if (unlikely(!((s32)netns < 0 || netns <= S32_MAX)))
> > + return -EINVAL;
> > +
> > + caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk);
> > + if ((s32)netns < 0) {
> > + net = caller_net;
> > + } else {
> > + net = get_net_ns_by_id(caller_net, netns);
> > + if (unlikely(!net))
> > + return -EINVAL;
> > + }
> > +
> > + mac_len = skb->mac_len;
> > + skb_cpy = skb_copy(skb, GFP_ATOMIC);
> > + if (!skb_cpy)
> > + return -ENOMEM;
>
> Given slow path, this idea is expensive but okay. Maybe in future it could be lifted
> which might be a bigger lift to teach verifier that input ctx cannot be accessed
> anymore.. but then frags are very much discouraged either way and bpf_ip_check_defrag()
> might only apply in corner case situations (like DNS, etc).

Originally I did go the route of teaching the verifier:

* https://github.com/danobi/linux/commit/35a66af8d54cca647b0adfc7c1da7105d2603dde
* https://github.com/danobi/linux/commit/e8c86ea75e2ca8f0631632d54ef763381308711e
* https://github.com/danobi/linux/commit/972bcf769f41fbfa7f84ce00faf06b5b57bc6f7a

It didn't seem too bad on the verifier front (or maybe I got it wrong),
but it seemed a bit unwieldy to wire ctx validity information back out
of the program given ip_check_defrag() may kfree() the skb (so stuffing
data inside skb->cb wouldn't work).

And yeah, given frags are highly discouraged, didn't seem like too bad
of a tradeoff.

>
> > + skb_out = ip_check_defrag(net, skb_cpy, IP_DEFRAG_BPF);
> > + if (IS_ERR(skb_out))
> > + return PTR_ERR(skb_out);
>
> Looks like ip_check_defrag() can gracefully handle IPv6 packet. It will just return back
> skb_cpy pointer in that case. However, this brings me to my main complaint.. I don't
> think we should merge anything IPv4-related without also having IPv6 equivalent support,
> otherwise we're building up tech debt, so pls also add support for the latter.

Ok, I'll take a crack at ipv6 support too. Most likely in the form of
another kfunc, depending on how the ipv6 frag infra is set up.

I'll be in and out during the holidays so v2 will most likely come some
time in the new year.

[...]

Thanks,
Daniel

Next message: Damien Le Moal: "Re: [PATCH V10 1/8] block, bfq: split sync bfq_queues on a per-actuator basis"
Previous message: Serge Semin: "Re: linux-next: manual merge of the pci tree with the arm-soc tree"
In reply to: Daniel Borkmann: "Re: [PATCH bpf-next 3/6] bpf, net, frags: Add bpf_ip_check_defrag() kfunc"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]