Re: There was missing ENDBR BUG in 5.19-rc3 mainline kernel on TGL-U

From: Peter Zijlstra
Date: Tue Jun 28 2022 - 06:57:58 EST


On Tue, Jun 28, 2022 at 04:28:58PM +0800, Pengfei Xu wrote:
> Hi Peter,
>
> Greeting!
>
> We found one "missing ENDBR BUG" on 5.19-rc3 kernel.
>
> Platform: TGL-U
> Kernel: 5.19-rc3 mainline
>
> 1. Boot up TGL-U
> 2. Execute kernel self-test shell script "ftracetest" in
> kernel_source/tools/testing/selftests/ftrace/
> # ./ftracetest
> === Ftrace unit tests ===
> [1] Basic trace file check [PASS]
> [2] Basic test for tracers [PASS]
> [3] Basic trace clock test [PASS]
> [4] Basic event tracing check [PASS]
> [5] Change the ringbuffer size [PASS]
> [6] Snapshot and tracing setting [PASS]
> [7] trace_pipe and trace_marker [PASS]
> [8] Test ftrace direct functions against tracers [UNRESOLVED]
> [9] Test ftrace direct functions against kprobes [UNRESOLVED]
> [10] Generic dynamic event - add/remove eprobe events [FAIL]
> [11] Generic dynamic event - add/remove kprobe events
>
> It 100% reproduced in step 11 and then missing ENDBR BUG generated:
> "
> [ 9332.752836] mmiotrace: enabled CPU7.
> [ 9332.788612] mmiotrace: disabled.
> [ 9337.103426] traps: Missing ENDBR: syscall_regfunc+0x0/0xb0
> [ 9337.103442] ------------[ cut here ]------------
> [ 9337.103444] kernel BUG at arch/x86/kernel/traps.c:253!
> [ 9337.103452] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> ...
> [ 9337.103506] Call Trace:
> ...
> [ 9337.103512] asm_exc_control_protection+0x30/0x40
> ...
> [ 9337.103540] ? trace_module_has_bad_taint+0x20/0x20
> [ 9337.103547] ? tracepoint_add_func+0x15f/0x360
> [ 9337.103551] ? perf_syscall_enter+0x1f0/0x1f0
> [ 9337.103556] tracepoint_probe_register_prio+0x5c/0x90
> [ 9337.103560] ? perf_syscall_enter+0x1f0/0x1f0
> "
>
> Dmesg was in attached.
> Do I need to do something further for this problem?

your .config would perhaps have been useful... and a Cc to lkml.

defconfig + kvm_guest.config + x86_debug.config + X86_KERNEL_IBT + lot
of tracing options gets me:

$ ./scripts/objdump-func defconfig-build/vmlinux.o syscall_regfunc
0000 0000000000181120 <syscall_regfunc>:
0000 181120: f3 0f 1e fa endbr64
...

So the function does have an ENDBR on for me. Now the other possibility
is that that ENDBR got scribbled by the sealing.

$ readelf -Wa defconfig-build/vmlinux.o | awk '/Relocation section.*ibt_endbr_seal/ { P=1 } /^$/ { if (P) exit } { if (P) print $0 }' | grep 181120
00000000000022a8 0000000200000002 R_X86_64_PC32 0000000000000000 .text + 181120

And yes, that's it. So objtool somehow misses that the address of this
function is taken.

If we grep around:

$ git grep syscall_regfunc
include/linux/tracepoint.h:extern int syscall_regfunc(void);
include/trace/events/syscalls.h: syscall_regfunc, syscall_unregfunc
include/trace/events/syscalls.h: syscall_regfunc, syscall_unregfunc
kernel/tracepoint.c:int syscall_regfunc(void)

we find it is only used in tracepoints, which then suggests the
following patch:

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 864bb9dd3584..57153e00349c 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -3826,8 +3826,7 @@ static int validate_ibt(struct objtool_file *file)
!strcmp(sec->name, "__bug_table") ||
!strcmp(sec->name, "__ex_table") ||
!strcmp(sec->name, "__jump_table") ||
- !strcmp(sec->name, "__mcount_loc") ||
- !strcmp(sec->name, "__tracepoints"))
+ !strcmp(sec->name, "__mcount_loc"))
continue;

list_for_each_entry(reloc, &sec->reloc->reloc_list, list)

And that does indeed seems to do the trick!