Re: [PATCH 0/2] kprobe: Fix: add symbols to kprobe blacklist

From: Masami Hiramatsu
Date: Sun Jul 16 2017 - 10:37:53 EST

Next message: Ben Hutchings: "[PATCH 3.16 052/178] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting"
Previous message: Ian Abbott: "Re: include/linux/kernel.h:860:32: error: dereferencing pointer to incomplete type 'struct clock_event_device'"
Next in thread: Masami Hiramatsu: "Re: [PATCH 0/2] kprobe: Fix: add symbols to kprobe blacklist"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Francis,

At first, thank you for reporting the report!

On Fri, 14 Jul 2017 10:58:33 -0400
Francis Deslauriers <francis.deslauriers@xxxxxxxxxxxx> wrote:

> Hi all,
>
> While fuzzing the Perf kprobe and kretprobe interfaces, I found some inputs
> that trigger crashes of a 4.12 kernel(6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c)
> on a x86-64 VM. I know that K(ret)probes can crash the kernel in multiple ways
> but should Perf be allowed to do it?

No, it must not be, it should be fixed.

> To do this analysis, I used the symbols reported by /proc/kallsyms in
> conjonction with the Perf debugfs interface. Using this technique, I was able
> to find two instrumentation configurations that could crash the kernel. I am
> suggesting changes that fixed both issues for me by blacklisting the symbols in
> question.
>
> Kprobe on apic_timer_interrupt:
> I believe that this is caused by the fact that kprobe adds a INT3 in a apic
> interrupt routine.
> How to reproduce:
> echo 'p:event1 apic_timer_interrupt ' > kprobe_events
> <Generate kernel activity. e.g. launch bash>
> Crash log:[1]

OK, I got this was reproduced even on qemu. (and you also seems to use qemu)

> This can be fixed by blacklisting the apicinterrupt3 symbols directly in the
> assembly macro. See patch[1/2]. I am not sure that blacklisting all
> apicinterrupt symbols is the right solution.

Wait, would you find the root cause of this crash? Your log [1] doesn't
show why this happens. It just shows there was an infinit call/fault
loop and kernel stack overflow. We should carefully analyze how it
happened before just blacklisting it.
(BTW, ahead to keep notice, but this problem can be solved by disabling
optprobe via /proc/sys/debug/kprobes-optimization.)

[ 161.618249] ? trace_hardirqs_off_caller+0x7/0xa0
[ 161.618965] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 161.619677] ? native_iret+0x7/0x7
[ 161.620282] ? error_entry+0x72/0xd0
[ 161.620901] ? error_entry+0x72/0xd0
[ 161.621517] ? async_page_fault+0x12/0x30
[ 161.622207] ? native_iret+0x7/0x7
[ 161.622819] ? error_entry+0x72/0xd0
[ 161.623435] ? trace_hardirqs_off_caller+0x7/0xa0
[ 161.624176] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 161.624893] ? native_iret+0x7/0x7
[ 161.625496] ? error_entry+0x72/0xd0
[ 161.626115] ? async_page_fault+0x12/0x30
[ 161.626771] ? optimized_callback+0x17/0xf0
[ 161.627443] ? 0xffffffffa000202f

$ eu-addr2line -e vmlinux optimized_callback+0x17
/home/mhiramat/ksrc/linux/include/linux/kprobes.h:384
----
382 static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
383 {
384 return this_cpu_ptr(&kprobe_ctlblk);
385 }

And the disasm is;
ffffffff8103d716: 53 push %rbx
ffffffff8103d717: 65 4c 03 25 11 ca fc add %gs:0x7efcca11(%rip),%r12 # a130 <this_cpu_off>
ffffffff8103d71e: 7e
----

OK, it seems this_cpu_ptr() caused pagefault.

And it seems very odd stack, since we have a few unique addresses on there.

(1) async_page_fault+0x12
----
ffffffff815f6480 <async_page_fault>:
[...]
ffffffff815f6489: 48 83 c4 88 add $0xffffffffffffff88,%rsp
ffffffff815f648d: e8 2e 01 00 00 callq ffffffff815f65c0 <error_entry>
ffffffff815f6492: 48 89 e7 mov %rsp,%rdi
----

(2) error_entry+0x72
----
ffffffff815f65c0 <error_entry>:
[...]
ffffffff815f662d: e8 88 ab a0 ff callq ffffffff810011ba <trace_hardirqs_off_thunk>
ffffffff815f6632: c3 retq
----

(3) trace_hardirqs_off_thunk+0x1a
----
ffffffff810011ba <trace_hardirqs_off_thunk>:
[...]
ffffffff810011cf: e8 3c 2f 0a 00 callq ffffffff810a4110 <trace_hardirqs_off_caller>
ffffffff810011d4: eb 18 jmp ffffffff810011ee <lockdep_sys_exit_thunk+0x18>
----

(4) trace_hardirqs_off_caller+0x7
----
ffffffff810a4110: 44 8b 05 b9 11 a1 00 mov 0xa111b9(%rip),%r8d # ffffffff81ab52d0 <debug_locks>
ffffffff810a4117: 65 48 8b 14 25 00 c4 mov %gs:0xc400,%rdx
ffffffff810a411e: 00 00
----

(5) native_iret+0x7 -> this seems native_irq_return_iret
----
ffffffff815f5d00 <native_iret>:
ffffffff815f5d00: f6 44 24 20 04 testb $0x4,0x20(%rsp)
ffffffff815f5d05: 75 02 jne ffffffff815f5d09 <native_irq_return_ldt>

ffffffff815f5d07 <native_irq_return_iret>:
ffffffff815f5d07: 48 cf iretq
----

So, the story what the stack said,

- optimized_callback() calls get_kprobe_ctlblk(), and this_cpu_ptr(&kprobe_ctlblk) caused a page fault (in apic_timer_interrupt, does it cause any problem?)
- and following call-chain occured
async_page_fault -> error_entry -> trace_hardirqs_off_thunk ->
trace_hardirqs_off_caller
- "mov %gs:0xc400,%rdx" caused async_page_fault() again.

Since trace_hardirqs_off_thunk() stores general registers on
the stack, there are some noises.

[ 114.429637] FS: 00000000021e7880(0000) GS:ffff88001fd40000(0000) knlGS:0000000000000000

So, the problem seems that cpu can not access to per-cpu pages.

> Kretprobe on ftrace_ops_assist_func and another function:

For this problem, Steve gave us an good idea so I agree with him.

Thank you,

> Those crashes are triggered when hooking a kretprobe on the
> ftrace_ops_assist_func symbol and some other functions to make the this first
> function reacheable. From my understanding, ftrace_ops_assist_func is the
> function called directly when the kprobe is hit. Thus it should be marked
> with NOKPROBE_SYMBOL.
>
> Here are some configurations that can easily reproduce this bug. Those other
> functions are called during the fork of a process so they are easy to control.
> Enable the following kprobes and launch a process to trigger a fork to see the
> kernel crash.
>
> Conf #1
> echo 'r:event1 ftrace_ops_assist_func' > kprobe_events
> echo 'r:event2 clear_all_latency_tracing' > kprobe_events
> Crash log:[2]
>
> Conf #2
> echo 'r:event1 ftrace_ops_assist_func' > kprobe_events
> echo 'r:event2 acct_clear_integrals' > kprobe_events
> Crash log:[3]
>
> Conf #3
> echo 'r:event1 ftrace_ops_assist_func' > kprobe_events
> echo 'r:event2 arch_dup_task_struct' > kprobe_events
> Crash log:[4]
>
> The ftrace_ops_assist_func should be included in the kprobe blacklist using
> NOKPROBE_SYMBOL. See patch [2/2].
>
> Since those were found using fuzzing, it's not an exhaustive analysis.
> Here is the .config I am using[5].
>
> Thanks,
>
> Francis Deslauriers
> EfficiOS inc.
>
>
> [1]: https://pastebin.com/Mpp9Yzqb
> [2]: https://pastebin.com/CtsfzUwG
> [3]: https://pastebin.com/txxuJXrz
> [4]: https://pastebin.com/8qrJvzD3
> [5]: https://pastebin.com/x5q0sgyK
>
> Francis Deslauriers (2):
> kprobe: fix: Add _ASM_NOKPROBE to x86 apic interrupt macro
> kprobe: fix: Add ftrace_ops_assist_func to kprobe blacklist
>
> arch/x86/entry/entry_64.S | 1 +
> kernel/trace/ftrace.c | 2 ++
> 2 files changed, 3 insertions(+)
>
> --
> 2.7.4
>

--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>

Next message: Ben Hutchings: "[PATCH 3.16 052/178] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting"
Previous message: Ian Abbott: "Re: include/linux/kernel.h:860:32: error: dereferencing pointer to incomplete type 'struct clock_event_device'"
Next in thread: Masami Hiramatsu: "Re: [PATCH 0/2] kprobe: Fix: add symbols to kprobe blacklist"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]