Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp

From: Jiri Olsa
Date: Thu Dec 08 2022 - 18:02:38 EST


On Thu, Dec 08, 2022 at 11:26:45PM +0100, Jiri Olsa wrote:
> On Thu, Dec 08, 2022 at 07:06:59PM +0100, Jiri Olsa wrote:
> > On Thu, Dec 08, 2022 at 09:48:52AM -0800, Alexei Starovoitov wrote:
> > > On Wed, Dec 7, 2022 at 11:57 AM Alexei Starovoitov
> > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > >
> > > > On Tue, Dec 6, 2022 at 7:18 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, Dec 06, 2022 at 02:46:43PM +0800, Hao Sun wrote:
> > > > > > Hao Sun <sunhao.th@xxxxxxxxx> 于2022年12月6日周二 11:28写道:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > The following crash can be triggered with the BPF prog provided.
> > > > > > > It seems the verifier passed some invalid progs. I will try to simplify
> > > > > > > the C reproducer, for now, the following can reproduce this:
> > > > > > >
> > > > > > > HEAD commit: ab0350c743d5 selftests/bpf: Fix conflicts with built-in
> > > > > > > functions in bpf_iter_ksym
> > > > > > > git tree: bpf-next
> > > > > > > console log: https://pastebin.com/raw/87RCSnCs
> > > > > > > kernel config: https://pastebin.com/raw/rZdWLcgK
> > > > > > > Syz reproducer: https://pastebin.com/raw/4kbwhdEv
> > > > > > > C reproducer: https://pastebin.com/raw/GFfDn2Gk
> > > > > > >
> > > > > >
> > > > > > Simplified C reproducer: https://pastebin.com/raw/aZgLcPvW
> > > > > >
> > > > > > Only two syscalls are required to reproduce this, seems it's an issue
> > > > > > in XDP test run. Essentially, the reproducer just loads a very simple
> > > > > > prog and tests run repeatedly and concurrently:
> > > > > >
> > > > > > r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000640)=@base={0x6, 0xb,
> > > > > > &(0x7f0000000500)}, 0x80)
> > > > > > bpf$BPF_PROG_TEST_RUN(0xa, &(0x7f0000000140)={r0, 0x0, 0x0, 0x0, 0x0,
> > > > > > 0x0, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x48)
> > > > > >
> > > > > > Loaded prog:
> > > > > > 0: (18) r0 = 0x0
> > > > > > 2: (18) r6 = 0x0
> > > > > > 4: (18) r7 = 0x0
> > > > > > 6: (18) r8 = 0x0
> > > > > > 8: (18) r9 = 0x0
> > > > > > 10: (95) exit
> > > > >
> > > > > hi,
> > > > > I can reproduce with your config.. it seems related to the
> > > > > recent static call change:
> > > > > c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)
> > > > >
> > > > > I can't reproduce when I revert that commit.. Peter, any idea?
> > > >
> > > > Jiri,
> > > >
> > > > I see your tested-by tag on Peter's commit c86df29d11df.
> > > > I assume you're actually tested it, but
> > > > this syzbot oops shows that even empty bpf prog crashes,
> > > > so there is something wrong with that commit.
> > > >
> > > > What is the difference between this new kconfig and old one that
> > > > you've tested?
>
> I attached the diff, 'config-issue' is the one that reproduces the issue
>
> > > >
> > > > I'm trying to understand the severity of the issues and
> > > > whether we need to revert that commit asap since the merge window
> > > > is about to start.
> > >
> > > Jiri, Peter,
> > >
> > > ping.
> > >
> > > cc-ing Thorsten, since he's tracking it now.
> > >
> > > The config has CONFIG_X86_KERNEL_IBT=y.
> > > Is it related?
> >
> > sorry for late reply.. I still did not find the reason,
> > but I did not try with IBT yet, will test now
>
> no difference with IBT enabled, can't reproduce the issue
>

ok, scratch that.. the reproducer got stuck on wifi init :-\

after I fix that I can now reproduce on my local config with
IBT enabled or disabled.. it's something else

jirka