Re: [PATCH] ftrace/x86: Fix function graph tracer reset path

From: Steven Rostedt
Date: Fri May 13 2016 - 10:06:19 EST


Matt,

This bug looks very similar to what you were hitting with the function
profiler. Can you apply this patch and see if it fixes the issue for
you.

Thanks!

-- Steve


On Fri, 13 May 2016 22:53:43 +0900
Namhyung Kim <namhyung@xxxxxxxxxx> wrote:

> On my system, simply enabling and disabling function graph tracer can
> crash the kernel. I don't know how it worked until now.
>
> The ftrace_disable_ftrace_graph_caller() modifies jmp instruction at
> ftrace_graph_call assuming it's a 5 bytes near jmp (e9 <offset>).
> However it's a short jmp consisting of 2 bytes only (eb <offset>). And
> ftrace_stub() is located just below the ftrace_graph_caller so
> modification above breaks the instruction resulting in kernel oops on
> the ftrace_stub() with the invalid opcode like below:
>
> # cd /sys/kernel/trace
> # echo function_graph > current_tracer
> # echo nop > current_tracer
>
> [ 78.122055] invalid opcode: 0000 [#1] SMP
> [ 78.125125] Modules linked in: x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel pcspkr iwldvm iwlwifi
> [ 78.128241] CPU: 2 PID: 17 Comm: migration/2 Not tainted 4.6.0-rc4+ #36
> [ 78.131310] Hardware name: LENOVO 4286A74/4286A74, BIOS 8DET56WW (1.26 ) 12/01/2011
> [ 78.134369] task: ffff88040bec4240 ti: ffff88040bee4000 task.ti: ffff88040bee4000
> [ 78.137412] RIP: 0010:[<ffffffff818939a8>] [<ffffffff818939a8>] ftrace_stub+0x0/0x8
> [ 78.140477] RSP: 0018:ffff88040bee7e48 EFLAGS: 00010246
> [ 78.143489] RAX: ffff88040bec4240 RBX: ffffffff8107a7b0 RCX: 000000000000001f
> [ 78.146512] RDX: 0000000000000000 RSI: ffff88041e2929d8 RDI: ffff88040bee7e50
> [ 78.149581] RBP: ffff88040bee7e80 R08: ffff88040bee4000 R09: 0000000000000000
> [ 78.152647] R10: 00000000000318b7 R11: ffff8800d661f800 R12: ffff88040d8011b0
> [ 78.155679] R13: ffffffff81e43620 R14: ffff88040bda8588 R15: ffffffff81e503e0
> [ 78.158675] FS: 0000000000000000(0000) GS:ffff88041e280000(0000) knlGS:0000000000000000
> [ 78.161699] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 78.164690] CR2: 00007fadb22dde1d CR3: 00000000d5ce2000 CR4: 00000000000406e0
> [ 78.167691] Stack:
> [ 78.170658] ffffffff8110b3ee ffffffff8188de90 00000009161d55f6 00000012306fc2e4
> [ 78.173710] 0000000000000000 ffff880400000000 ffff88040bec4240 ffff88040bee7ec8
> [ 78.176783] ffffffff81893bbd 0000000000000000 ffff88040bec4240 ffffffff81893ba8
> [ 78.179863] Call Trace:
> [ 78.182853] [<ffffffff8110b3ee>] ? ftrace_return_to_handler+0x8e/0x100
> [ 78.185909] [<ffffffff8188de90>] ? __schedule+0xae0/0xae0
> [ 78.188941] [<ffffffff81893bbd>] return_to_handler+0x15/0x27
> [ 78.192001] [<ffffffff81893ba8>] ? ftrace_graph_caller+0xa8/0xa8
> [ 78.195091] [<ffffffff8107a6f0>] ? sort_range+0x30/0x30
> [ 78.198138] [<ffffffff810778a9>] kthread+0xc9/0xe0
> [ 78.201143] [<ffffffff81891a12>] ret_from_fork+0x22/0x40
> [ 78.204138] [<ffffffff810777e0>] ? kthread_worker_fn+0x170/0x170
> [ 78.207129] Code: 8b 44 24 48 48 8b 7c 24 70 48 8b 74 24 68 48 8b 54 24 60
> 48 8b 4c 24 58 48 8b 44 24 50 48 8b 6c 24 20 48 81 c4 d0
> 00 00 00 e9 fd <ff> ff ff 80 00 00 00 00 9c 55 ff 74 24 18
> 55 48 89 e5 ff 74 24
> [ 78.213997] RIP [<ffffffff818939a8>] ftrace_stub+0x0/0x8
> [ 78.217190] RSP <ffff88040bee7e48>
> [ 78.220374] ---[ end trace 0af2f0d9f7301011 ]---
>
> Looking at the code dump, the ftrace_stub instruction was overwritten as
> <ff>. Below is disassembly output of related code.
>
> $ objdump -d --start-address=0xffffffff818939a6 --stop-address=0xffffffff818939b0 vmlinux
>
> vmlinux: file format elf64-x86-64
>
> Disassembly of section .text:
>
> ffffffff818939a6 <ftrace_epilogue>:
> ffffffff818939a6: eb 00 jmp ffffffff818939a8 <ftrace_stub>
>
> ffffffff818939a8 <ftrace_stub>:
> ffffffff818939a8: c3 retq
> ffffffff818939a9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
>
> As you can see ftrace_epilogue (same as ftrace_graph_caller) is 2 byte
> ahead of ftrace_stub. And it's replaced by a jump to ftrace_stub() by
> ftrace_disable_ftrace_graph_caller: "e9 <0xfffffffd>". Pads 3 bytes
> after ftrace_epilogue to prevent ftrace_stub from being overwritten.
>
> Signed-off-by: Namhyung Kim <namhyung@xxxxxxxxxx>
> ---
> arch/x86/kernel/mcount_64.S | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/arch/x86/kernel/mcount_64.S b/arch/x86/kernel/mcount_64.S
> index ed48a9f465f8..0e6af57a713a 100644
> --- a/arch/x86/kernel/mcount_64.S
> +++ b/arch/x86/kernel/mcount_64.S
> @@ -180,6 +180,15 @@ GLOBAL(ftrace_epilogue)
> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> GLOBAL(ftrace_graph_call)
> jmp ftrace_stub
> + /*
> + * The above jmp is generated as a short jump which occupies 2 bytes
> + * but ftrace_enable/disable_ftrace_graph_caller() assumes it's a
> + * near jump which occupies 5 bytes so breaks ftrace_stub() below.
> + * Add 3 bytes padding to avoid that.
> + */
> + nop
> + nop
> + nop
> #endif
>
> GLOBAL(ftrace_stub)