Re: Linux 6.11-rc1

From: Linus Torvalds
Date: Tue Jul 30 2024 - 14:54:10 EST


[ Adding x86-32 entry code people, more context at the thread in:

https://lore.kernel.org/all/3f65bfad-bd04-4651-bbe3-e2b1925f1a13@xxxxxxxxx/

for people who were dragged in late ]

On Tue, 30 Jul 2024 at 10:04, Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
>
> From the crash log:

The full log is more informative, at

http://server.roeck-us.net/qemu/x86-nosmp/

which has that config too.

> [ 3.605247] sr 2:0:0:0: Attached scsi generic sg0 type 5
> [ 3.764508] sched_clock: Marking stable (3740032902, 23766486)->(3766853760, -3054372)
> [ 3.768164] registered taskstats version 1
> [ 3.768271] Loading compiled-in X.509 certificates
> [ 3.990683] Btrfs loaded, zoned=no, fsverity=no
> [ 4.005012] cryptomgr_test (68) used greatest stack depth: 6136 bytes left
> [ 4.029889] traps: PANIC: double fault, error_code: 0x0

Double faults are bad bad juju. Nasty to debug, because it means
something went wrong at a horribly bad time.

> [ 4.030613] EIP: asm_exc_page_fault+0x0/0x10

Sadly, this mainly says that taking a page fault was part of the
horribly bad time.

> [ 4.031389] <ENTRY_TRAMPOLINE>
> [ 4.031392] ? asm_exc_int3+0x10/0x10
> ...
> [ 4.033360] ? asm_exc_int3+0x10/0x10
> [ 4.033368] ? restore_all_switch_stack+0x65/0xe6
> [ 4.033386] </ENTRY_TRAMPOLINE>

Yeah "restore_all_switch_stack" is also part of "horribly bad time".

And from the full log, I see that the "..." is a *lot* of asm_exc_int3+0x10.

Which makes me think it's asm_exc_int3 just recursively failing.

Which will cause a stack overflow, and then - after a time - a double fault.

[ Time passes, I build the i386 kernel image with your config just to
get an image that looks like yours ]

Hmm. I think the stack dump output confused me. Because
"asm_exc_int3+0x10/0x10" doesn't end up making much sense, but it
turns out that "asm_exc_int3+0x10" is actually the same as
'asm_exc_page_fault'.

So it smells like we're taking a page fault, but somehow the page
fault text address has been unmapped, so taking a page fault causes a
page fault and then we end up finally in that same "no more stack,
double fault" situation.

Either page table corruption, or some issue with the page table mitigation.

The fact that it started happening with the block merge may be because
the block code causes some major corruption, or may just be random bad
luck and it just changed some alignment somewhere, and exposed a
hidden but pre-existing issue.

Jens separately said that he can see it with gcc-11, but not his
regular compiler, so regardless it seems to be compiler-dependent.

Let's see it x86 people have some idea, but that

restore_all_switch_stack+0x65/0xe6

and doing an objdump to see the code generation, it is literally here:

0f 20 d8 mov %cr3,%eax
0d 00 10 00 00 or $0x1000,%eax
0f 22 d8 mov %eax,%cr3
eb 16 jmp <restore_all_switch_stack+0x7d>

with that "jmp" instruction being the restore_all_switch_stack+0x65 address.

So the infinite page faults seem to literally happen right after the
"mov %eax,%cr3".

Definitely something wrong with the page tables. But where that
wrongness comes from, I have no idea.

Linus