Re: [PATCH v2] net: bpf: permit redirect from L3 to L2 devices at near max mtu
From: Jakub Kicinski
Date: Wed May 06 2020 - 19:55:21 EST
On Wed, 6 May 2020 16:32:59 -0700 Maciej Åenczykowski wrote:
> From: Maciej Åenczykowski <maze@xxxxxxxxxx>
>
> __bpf_skb_max_len(skb) is used from:
> bpf_skb_adjust_room
> __bpf_skb_change_tail
> __bpf_skb_change_head
>
> but in the case of forwarding we're likely calling these functions
> during receive processing on ingress and bpf_redirect()'ing at
> a later point in time to egress on another interface, thus these
> mtu checks are for the wrong device (input instead of output).
>
> This is particularly problematic if we're receiving on an L3 1500 mtu
> cellular interface, trying to add an L2 header and forwarding to
> an L3 mtu 1500 mtu wifi/ethernet device (which is thus L2 1514).
>
> The mtu check prevents us from adding the 14 byte ethernet header prior
> to forwarding the packet.
>
> After the packet has already been redirected, we'd need to add
> an additional 2nd ebpf program on the target device's egress tc hook,
> but then we'd also see non-redirected traffic and have no easy
> way to tell apart normal egress with ethernet header packets
> from forwarded ethernet headerless packets.
>
> Credits to Alexei Starovoitov for the suggestion on how to 'fix' this.
>
> This probably should be further fixed up *somehow*, just
> not at all clear how. This does at least make things work.
>
> Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
> Signed-off-by: Maciej Åenczykowski <maze@xxxxxxxxxx>
> ---
> net/core/filter.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 7d6ceaa54d21..811aba77e24b 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3159,8 +3159,20 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
>
> static u32 __bpf_skb_max_len(const struct sk_buff *skb)
> {
> - return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len :
> - SKB_MAX_ALLOC;
> + if (skb->dev) {
> + unsigned short header_len = skb->dev->hard_header_len;
> +
> + /* HACK: Treat 0 as ETH_HLEN to allow redirect while
> + * adding ethernet header from an L3 (tun, rawip, cellular)
> + * to an L2 device (tap, ethernet, wifi)
> + */
> + if (!header_len)
> + header_len = ETH_HLEN;
> +
> + return skb->dev->mtu + header_len;
> + } else {
> + return SKB_MAX_ALLOC;
> + }
> }
>
> BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
I thought we have established that checking device MTU (m*T*u)
at ingress makes a very limited amount of sense, no?
Shooting from the hip here, but won't something like:
if (!skb->dev || skb->tc_at_ingress)
return SKB_MAX_ALLOC;
return skb->dev->mtu + skb->dev->hard_header_len;
Solve your problem?