Re: [PATCH bpf-next v3 0/4] xdp: recycle Page Pool backed skbs built from XDP frames

From: Ilya Leoshkevich
Date: Wed Mar 15 2023 - 14:02:06 EST


On Wed, 2023-03-15 at 15:54 +0100, Ilya Leoshkevich wrote:
> On Wed, 2023-03-15 at 11:54 +0100, Alexander Lobakin wrote:
> > From: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx>
> > Date: Wed, 15 Mar 2023 10:56:25 +0100
> >
> > > From: Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx>
> > > Date: Tue, 14 Mar 2023 16:54:25 -0700
> > >
> > > > On Tue, Mar 14, 2023 at 11:52 AM Alexei Starovoitov
> > > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > [...]
> > >
> > > > test_xdp_do_redirect:PASS:prog_run 0 nsec
> > > > test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
> > > > test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
> > > > test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc:
> > > > actual
> > > > 220 != expected 9998
> > > > test_max_pkt_size:PASS:prog_run_max_size 0 nsec
> > > > test_max_pkt_size:PASS:prog_run_too_big 0 nsec
> > > > close_netns:PASS:setns 0 nsec
> > > > #289 xdp_do_redirect:FAIL
> > > > Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED
> > > >
> > > > Alex,
> > > > could you please take a look at why it's happening?
> > > >
> > > > I suspect it's an endianness issue in:
> > > >         if (*metadata != 0x42)
> > > >                 return XDP_ABORTED;
> > > > but your patch didn't change that,
> > > > so I'm not sure why it worked before.
> > >
> > > Sure, lemme fix it real quick.
> >
> > Hi Ilya,
> >
> > Do you have s390 testing setups? Maybe you could take a look, since
> > I
> > don't have one and can't debug it? Doesn't seem to be Endianness
> > issue.
> > I mean, I have this (the below patch), but not sure it will fix
> > anything -- IIRC eBPF arch always matches the host arch ._.
> > I can't figure out from the code what does happen wrongly :s And it
> > happens only on s390.
> >
> > Thanks,
> > Olek
> > ---
> > diff --git
> > a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> > b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> > index 662b6c6c5ed7..b21371668447 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> > @@ -107,7 +107,7 @@ void test_xdp_do_redirect(void)
> >                             .attach_point = BPF_TC_INGRESS);
> >  
> >         memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp));
> > -       *((__u32 *)data) = 0x42; /* metadata test value */
> > +       *((__u32 *)data) = htonl(0x42); /* metadata test value */
> >  
> >         skel = test_xdp_do_redirect__open();
> >         if (!ASSERT_OK_PTR(skel, "skel"))
> > diff --git
> > a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> > b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> > index cd2d4e3258b8..2475bc30ced2 100644
> > --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> > +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> > @@ -1,5 +1,6 @@
> >  // SPDX-License-Identifier: GPL-2.0
> >  #include <vmlinux.h>
> > +#include <bpf/bpf_endian.h>
> >  #include <bpf/bpf_helpers.h>
> >  
> >  #define ETH_ALEN 6
> > @@ -28,7 +29,7 @@ volatile int retcode = XDP_REDIRECT;
> >  SEC("xdp")
> >  int xdp_redirect(struct xdp_md *xdp)
> >  {
> > -       __u32 *metadata = (void *)(long)xdp->data_meta;
> > +       __be32 *metadata = (void *)(long)xdp->data_meta;
> >         void *data_end = (void *)(long)xdp->data_end;
> >         void *data = (void *)(long)xdp->data;
> >  
> > @@ -44,7 +45,7 @@ int xdp_redirect(struct xdp_md *xdp)
> >         if (metadata + 1 > data)
> >                 return XDP_ABORTED;
> >  
> > -       if (*metadata != 0x42)
> > +       if (*metadata != __bpf_htonl(0x42))
> >                 return XDP_ABORTED;
> >  
> >         if (*payload == MARK_XMIT)
>
> Okay, I'll take a look. Two quick observations for now:
>
> - Unfortunately the above patch does not help.
>
> - In dmesg I see:
>
>     Driver unsupported XDP return value 0 on prog xdp_redirect (id
> 23)
>     dev N/A, expect packet loss!

I haven't identified the issue yet, but I have found a couple more
things that might be helpful:

- In problematic cases metadata contains 0, so this is not an
endianness issue. data is still reasonable though. I'm trying to
understand what is causing this.

- Applying the following diff:

--- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
@@ -52,7 +52,7 @@ int xdp_redirect(struct xdp_md *xdp)

*payload = MARK_IN;

- if (bpf_xdp_adjust_meta(xdp, 4))
+ if (false && bpf_xdp_adjust_meta(xdp, 4))
return XDP_ABORTED;

if (retcode > XDP_PASS)

causes a kernel panic even on x86_64:

BUG: kernel NULL pointer dereference, address: 0000000000000d28
...
Call Trace:
<TASK>
build_skb_around+0x22/0xb0
__xdp_build_skb_from_frame+0x4e/0x130
bpf_test_run_xdp_live+0x65f/0x7c0
? __pfx_xdp_test_run_init_page+0x10/0x10
bpf_prog_test_run_xdp+0x2ba/0x480
bpf_prog_test_run+0xeb/0x110
__sys_bpf+0x2b9/0x570
__x64_sys_bpf+0x1c/0x30
do_syscall_64+0x48/0xa0
entry_SYSCALL_64_after_hwframe+0x72/0xdc

I haven't looked into this at all, but I believe this needs to be
fixed - BPF should never cause kernel panics.