Re: [linus:master] [x86/entry] be5341eb0d: WARNING:CPU:#PID:#at_int80_emulation

From: Borislav Petkov
Date: Tue Dec 19 2023 - 04:58:53 EST


On Tue, Dec 19, 2023 at 04:49:14PM +0800, kernel test robot wrote:
> [ 13.481107][ T48] WARNING: CPU: 0 PID: 48 at int80_emulation (arch/x86/entry/common.c:164)
> [ 13.481454][ T48] Modules linked in:
> [ 13.481655][ T48] CPU: 0 PID: 48 Comm: init Tainted: G N 6.7.0-rc4-00002-gbe5341eb0d43 #1
> [ 13.482162][ T48] RIP: 0010:int80_emulation (arch/x86/entry/common.c:164)

Looking at the dmesg, I think you missed the most important part - the
preceding line:

[ 13.480504][ T48] CFI failure at int80_emulation+0x67/0xb0 (target: sys_ni_posix_timers+0x0/0x70; expected type: 0xb02b34d9)
^^^^^^^^^^^

[ 13.481107][ T48] WARNING: CPU: 0 PID: 48 at int80_emulation+0x67/0xb0
[ 13.481454][ T48] Modules linked in:
[ 13.481655][ T48] CPU: 0 PID: 48 Comm: init Tainted: G N 6.7.0-rc4-00002-gbe5341eb0d43 #1

The CFI bla is also in the stack trace.

Now, decode_cfi_insn() has a comment there which says what the compiler
generates about indirect call checks:

*   movl -<id>, %r10d ; 6 bytes
* addl -4(%reg), %r10d ; 4 bytes
* je .Ltmp1 ; 2 bytes
* ud2 ; <- regs->ip
* .Ltmp1:


and the opcodes you decoded...

> [ 13.482437][ T48] Code: 01 00 00 77 43 89 c1 48 81 f9 c9 01 00 00 48 19 c9 21 c1 48 89 df 4c 8b 1c cd 90 12 20 9a 41 ba 27 cb d4 4f 45 03 53 fc 74 02 <0f> 0b 41 ff d3 48 89 c1 48 89 4b 50 90 48 89 df 5b 41 5e 31 c0 31
> All code
> ========
> 0: 01 00 add %eax,(%rax)
> 2: 00 77 43 add %dh,0x43(%rdi)
> 5: 89 c1 mov %eax,%ecx
> 7: 48 81 f9 c9 01 00 00 cmp $0x1c9,%rcx
> e: 48 19 c9 sbb %rcx,%rcx
> 11: 21 c1 and %eax,%ecx
> 13: 48 89 df mov %rbx,%rdi
> 16: 4c 8b 1c cd 90 12 20 mov -0x65dfed70(,%rcx,8),%r11
> 1d: 9a
> 1e: 41 ba 27 cb d4 4f mov $0x4fd4cb27,%r10d
> 24: 45 03 53 fc add -0x4(%r11),%r10d
> 28: 74 02 je 0x2c
> 2a:* 0f 0b ud2 <-- trapping instruction

... these guys here, look exactly like what the compiler did issue.

This is the first time I'm looking at this CFI bla but it sounds like it
is trying to compare the syscall target's address of
sys_ni_posix_timers with something it is expecting to call and the
comparison doesn't work out (%r10 is not 0).

There's that special symbol __cfi_sys_ni_posix_timers which also gets
generated...

Someone would need to dig into that whole CFI gunk to figure out why
this is not happy.

Oh well.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette