Re: [PATCH bpf-next v2 0/8] Support defragmenting IPv(4|6) packets in BPF
From: Daniel Xu
Date: Tue Mar 07 2023 - 14:56:25 EST
Hi Alexei,
(cc netfilter maintainers)
On Mon, Mar 06, 2023 at 08:17:20PM -0800, Alexei Starovoitov wrote:
> On Tue, Feb 28, 2023 at 3:17 PM Daniel Xu <dxu@xxxxxxxxx> wrote:
> >
> > > Have you considered to skb redirect to another netdev that does ip defrag?
> > > Like macvlan does it under some conditions. This can be generalized.
> >
> > I had not considered that yet. Are you suggesting adding a new
> > passthrough netdev thing that'll defrags? I looked at the macvlan driver
> > and it looks like it defrags to handle some multicast corner case.
>
> Something like that. A netdev that bpf prog can redirect too.
> It will consume ip frags and eventually will produce reassembled skb.
>
> The kernel ip_defrag logic has timeouts, counters, rhashtable
> with thresholds, etc. All of them are per netns.
> Just another ip_defrag_user will still share rhashtable
> with its limits. The kernel can even do icmp_send().
> ip_defrag is not a kfunc. It's a big block with plenty of kernel
> wide side effects.
> I really don't think we can alloc_skb, copy_skb, and ip_defrag it.
> It messes with the stack too much.
> It's also not clear to me when skb is reassembled and how bpf sees it.
> "redirect into reassembling netdev" and attaching bpf prog to consume
> that skb is much cleaner imo.
> May be there are other ways to use ip_defrag, but certainly not like
> synchronous api helper.
I was giving the virtual netdev idea some thought this morning and I
thought I'd give the netfilter approach a deeper look.
>From my reading (I'll run some tests later) it looks like netfilter
will defrag all ipv4/ipv6 packets in any netns with conntrack enabled.
It appears to do so in NF_INET_PRE_ROUTING.
Unfortunately that does run after tc hooks. But fortunately with the
new BPF netfilter hooks I think we can make defrag work outside of BPF
kfuncs like you want. And the NF_IP_FORWARD hook works well for my
router use case.
One thing we would need though are (probably kfunc) wrappers around
nf_defrag_ipv4_enable() and nf_defrag_ipv6_enable() to ensure BPF progs
are not transitively depending on defrag support from other netfilter
modules.
The exact mechanism would probably need some thinking, as the above
functions kinda rely on module_init() and module_exit() semantics. We
cannot make the prog bump the refcnt every time it runs -- it would
overflow. And it would be nice to automatically free the refcnt when
prog is unloaded.
Once the netfilter prog type series lands I can get that discussion
started. Unless Daniel feels strongly that we should continue with
the approach in this patchset, I am leaning towards dropping in favor
of netfilter approach.
Thanks,
Daniel