Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

From: Nikolay Borisov
Date: Thu Jan 28 2021 - 11:49:10 EST




On 28.01.21 г. 18:12 ч., Nikolay Borisov wrote:
>
>
> On 28.01.21 г. 5:38 ч., Masami Hiramatsu wrote:
>> Hi,
>
> <snip>
>
>>
>> Alexei, could you tell me what is the concerning situation for bpf?
>
> Another data point masami is that this affects bpf kprobes which are
> entered via int3, alternatively if the kprobe is entered via
> kprobe_ftrace_handler it works as expected. I haven't been able to
> determine why a particular bpf probe won't use ftrace's infrastructure
> if it's put at the beginning of the function. An alternative call chain
> is :
>
> => __ftrace_trace_stack
> => trace_call_bpf
> => kprobe_perf_func
> => kprobe_ftrace_handler
> => 0xffffffffc095d0c8
> => btrfs_validate_metadata_buffer
> => end_bio_extent_readpage
> => end_workqueue_fn
> => btrfs_work_helper
> => process_one_work
> => worker_thread
> => kthread
> => ret_from_fork
>
>>

I have a working theory why I'm seeing this. My kernel (broken) was
compiled with retpolines off and with the gcc that comes with ubuntu
(both 9 and 10:
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
gcc-10 (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0
)

this results in CFI being enabled so functions look like:
0xffffffff81493890 <+0>: endbr64
0xffffffff81493894 <+4>: callq 0xffffffff8104d820 <__fentry__>

i.e fentry's thunk is not the first instruction on the function hence
it's not going through the optimized ftrace handler. Instead it's using
int3 which is broken as ascertained.

After testing with my testcase I confirm that with cfi off and
__fentry__ being the first entry bpf starts working. And indeed, even
with CFI turned on if I use a probe like :

bpftrace -e 'kprobe:btrfs_sync_file+4 {printf("kprobe: %s\n",
kstack());}' &>bpf-output &


it would be placed on the __fentry__ (and not endbr64) hence it works.
So perhaps a workaround outside of bpf could essentially detect this
scenario and adjust the probe to be on the __fentry__ and not preceding
instruction if it's detected to be endbr64 ?



<snip>