Re: Linux 6.11-rc1

From: Peter Zijlstra
Date: Wed Jul 31 2024 - 05:12:09 EST

Next message: Juri Lelli: "Re: [PATCH v11 7/7] sched: Split scheduler and execution contexts"
Previous message: Pavel Machek: "Re: [PATCH 6.10 000/809] 6.10.3-rc2 review"
In reply to: Borislav Petkov: "Re: Linux 6.11-rc1"
Next in thread: Borislav Petkov: "Re: Linux 6.11-rc1"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Jul 31, 2024 at 10:21:11AM +0200, Borislav Petkov wrote:
> On Tue, Jul 30, 2024 at 04:54:43PM -0700, Linus Torvalds wrote:
> > You also seemed to say that it only happened with some CPU selections.
> > Maybe there's something wrong with the ALTERNATIVE() cleanups - I'm
> > looking at that new "nested alternatives macros" thing, and the odd
> > games we play with the origin and replacement lengths etc.
> >
> > That all looks entirely crazy. That file was hard to read before, now
> > it's just incomprehensible to me.
>
> I'm sorry to hear that. The reason we did it is because it was starting to
> become really unwieldy to add a yet another alternative choice N in an
> ALTERNATIVE_N call...
>
> Anyway, I'll try to reproduce here. In the meantime, can anyone who can
> reproduce - Guenter, Jens - boot that failing kernel with
>
> debug-alternative=-1
>
> and copy dmesg and vmlinux somewhere for me?
>
> It is a lot of output so make sure to catch it all.

So what I done instead is add: nokaslr to CMDLINE and -S -s to qemu and
am staring at the failing kernel in gdb.

So far all the alternatives in the affected paths look just fine.

Not that any of it is making sense, notably:

Code: bf 1e c2 e9 23 06 00 00 66 90 8d 76 00 fc 6a 00 68 f0 bd 1e c2 e9 11 06 00 00 8d 76 00 fc 6a 00 68 54 c5 1e c2 e9 01 06 00 00 <8d> 76 00 fc 68 b0 e9 1e c2 e9 f3 05 00 00 66 90 8d 76 00 fc 6a 00

decodes to:

0: bf 1e c2 e9 23 mov $0x23e9c21e,%edi
5: 06 (bad)
6: 00 00 add %al,(%rax)
8: 66 90 xchg %ax,%ax
asm_exc_invalid_op:
a: 8d 76 00 lea 0x0(%rsi),%esi
d: fc cld
e: 6a 00 push $0x0
10: 68 f0 bd 1e c2 push $0xffffffffc21ebdf0
15: e9 11 06 00 00 jmp 0x62b
asm_exc_int3:
1a: 8d 76 00 lea 0x0(%rsi),%esi
1d: fc cld
1e: 6a 00 push $0x0
20: 68 54 c5 1e c2 push $0xffffffffc21ec554
25: e9 01 06 00 00 jmp 0x62b
asm_exc_page_fault:
2a:* 8d 76 00 lea 0x0(%rsi),%esi <-- trapping instruction
2d: fc cld
2e: 68 b0 e9 1e c2 push $0xffffffffc21ee9b0
33: e9 f3 05 00 00 jmp 0x62b
38: 66 90 xchg %ax,%ax
asm_exc_machine_check:
3a: 8d 76 00 lea 0x0(%rsi),%esi
3d: fc cld
3e: 6a 00 push $0x0

And that trapping instruction is the CLAC nop (still a nop in the
faulting kernel image):

(gdb) disassemble asm_exc_page_fault
Dump of assembler code for function asm_exc_page_fault:
0xc2200350 <+0>: lea 0x0(%esi),%esi
0xc2200353 <+3>: cld
0xc2200354 <+4>: push $0xc21ee9b0
0xc2200359 <+9>: jmp 0xc2200951 <handle_exception>
End of assembler dump.

And then we have the endless stream of:

asm_exc_int3+0x10/0x10

which really is: asm_exc_page_fault+0x0/0x10, but that cannot be,
because then we'd have #DF much sooner.

The restore_all_switch_stack+0x65/0xe6 thing looks like so in the live
kernel image:

(gdb) disassemble restore_all_switch_stack
Dump of assembler code for function entry_INT80_32:
...
0xc22008c5 <+353>: mov %cr3,%eax
0xc22008c8 <+356>: or $0x1000,%eax
0xc22008cd <+361>: mov %eax,%cr3
0xc22008d0 <+364>: mov %esi,%esi <--- here
0xc22008d2 <+366>: testl $0x2,0x34(%esp)
0xc22008da <+374>: je 0xc22008e8 <entry_INT80_32+388>
0xc22008dc <+376>: mov %cr3,%eax
0xc22008df <+379>: test $0x1000,%eax
0xc22008e4 <+384>: jne 0xc22008e8 <entry_INT80_32+388>
0xc22008e6 <+386>: ud2
0xc22008e8 <+388>: pop %ebx
...

So that is indeed BUG_IF_WRONG_CR3 and the JMP got patched to a NOP2.
Nothing strange there.

So yeah, no clue still.

Next message: Juri Lelli: "Re: [PATCH v11 7/7] sched: Split scheduler and execution contexts"
Previous message: Pavel Machek: "Re: [PATCH 6.10 000/809] 6.10.3-rc2 review"
In reply to: Borislav Petkov: "Re: Linux 6.11-rc1"
Next in thread: Borislav Petkov: "Re: Linux 6.11-rc1"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]