Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

From: Masami Hiramatsu
Date: Wed Jan 27 2021 - 23:10:30 EST


Hi,

On Thu, 28 Jan 2021 10:34:15 +0900
Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:

> On Wed, 27 Jan 2021 19:57:56 +0200
> Nikolay Borisov <nborisov@xxxxxxxx> wrote:
>
> >
> >
> > On 27.01.21 г. 17:24 ч., Masami Hiramatsu wrote:
> > > On Thu, 28 Jan 2021 00:13:53 +0900
> > > Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
> > >
> > >> Hi Nikolay,
> > >>
> > >> On Wed, 27 Jan 2021 15:43:29 +0200
> > >> Nikolay Borisov <nborisov@xxxxxxxx> wrote:
> > >>
> > >>> Hello,
> > >>>
> > >>> I'm currently seeing latest Linus' master being somewhat broken w.r.t
> > >>> krpobes. In particular I have the following test-case:
> > >>>
> > >>> #!/bin/bash
> > >>>
> > >>> mkfs.btrfs -f /dev/vdc &> /dev/null
> > >>> mount /dev/vdc /media/scratch/
> > >>>
> > >>> bpftrace -e 'kprobe:btrfs_sync_file {printf("kprobe: %s\n", kstack());}'
> > >>> &>bpf-output &
> > >>> bpf_trace_pid=$!
> > >>>
> > >>> # force btrfs_sync_file to be called
> > >>> sleep 2
> > >>> xfs_io -f -c "pwrite 0 4m" -c "fsync" /media/scratch/file5
> > >>>
> > >>> kill $bpf_trace_pid
> > >>> sleep 1
> > >>>
> > >>> grep -q kprobe bpf-output
> > >>> retval=$?
> > >>> rm -f bpf-output
> > >>> umount /media/scratch
> > >>>
> > >>> exit $retval
> > >>>
> > >>> It traces btrfs_sync_file which is called when fsync is executed on a
> > >>> btrfs file, however I don't see the stacktrace being printed i.e the
> > >>> kprobe doesn't fire at all. The following alternative program:
> > >>>
> > >>> bpftrace -e 'tracepoint:btrfs:btrfs_sync_file {printf("tracepoint:
> > >>> %s\n", kstack());} kprobe:btrfs_sync_file {printf("kprobe: %s\n",
> > >>> kstack());}'
> > >>>
> > >>> only prints the stack from the tracepoint and not from the kprobe, given
> > >>> that the tracepoint is called from the btrfs_sync_file function.
> > >>
> > >> Thank you for reporting!
> > >>
> > >> If you don't mind, could you confirm it with ftrace (tracefs)?
> > >> bpftrace etc. involves too many things. It is better to test with
> > >> simpler way to test it.
> > >> I'm not familer with the bpftrace, but I think you can check it with
> > >>
> > >> # cd /sys/kernel/debug/tracing
> > >> # echo p:myevent btrfs_sync_file >> kprobe_events
> > >> # echo stacktrace > events/kprobes/myevent/trigger
> > >> (or echo 1 > options/stacktrace , if trigger file doesn't exist)
> > >
> > > Of course, also you have to enable the event.
> > > # echo 1 > events/kprobes/myevent/enable
> > >
> > > And check the results
> > >
> > > # cat trace
> > >
> > >
> > >> Could you also share your kernel config, so that we can reproduce it?
> > >
> >
> > I've attached the config and indeed with the scenario you proposed it
> > seems to works. I see:
> >
> > xfs_io-20280 [000] d.Z. 9900.748633: myevent:
> > (btrfs_sync_file+0x0/0x580)
> > xfs_io-20280 [000] d.Z. 9900.748647: <stack trace>
> > => kprobe_trace_func
> > => kprobe_dispatcher
> > => kprobe_int3_handler
> > => exc_int3
> > => asm_exc_int3
> > => btrfs_sync_file
> > => do_fsync
> > => __x64_sys_fsync
> > => do_syscall_64
> > => entry_SYSCALL_64_after_hwframe
>
> Hmm, then there might be a problem in bpftrace or ebpf (need more info).
> At least kprobes itself isn't broken.
> I guess they check "in_nmi()" and skip such event?

Yeah, there is. Nikolay, could you try this tentative patch?

Of course this just drops the NMI check from the handler, so alternative
checker is required. But I'm not sure what the original code concerns.
As far as I can see, there seems no re-entrant block flag, nor locks
among ebpf programs in runtime.

Alexei, could you tell me what is the concerning situation for bpf?

Thank you,