Re: [PATCH v2 2/3] x86/cpu/intel: Simplify F00F bug notice using pr_notice_once()

From: Maciej W. Rozycki

Date: Mon May 25 2026 - 06:41:32 EST


On Fri, 22 May 2026, Maciej W. Rozycki wrote:

> (now that I've looked at it again, I can see it's 6.13.0 as it's been a
> while, so maybe it's gone now in 7.x, hmm... will have to check.)

Yep, still there:

------------[ cut here ]------------
Bad FPU state detected at restore_fpregs_from_fpstate+0x48/0x50, reinitializing FPU registers.
WARNING: at fixup_exception+0x2a1/0x2c0, CPU#1: ld-linux.so.2/9621
CPU: 1 UID: 500 PID: 9621 Comm: ld-linux.so.2 Tainted: G W 7.0.0-dirty #1 PREEMPT
Tainted: [W]=WARN
Hardware name: [...]
EIP: fixup_exception+0x2a1/0x2c0
Code: 40 fe ff ff 0f 0b 8d 76 00 0f 0b ba 0c 99 a4 c0 e9 76 fe ff ff 8b 7e 30 c6 05 bf 82 9a c0 01 57 68 38 1b 8a c0 e8 1f be 00 00 <0f> 0b 58 5a e9 7b fe ff ff 8d b6 00 00 00 00 0f 0b ba 0c 99 a4 c0
EAX: 0000005e EBX: c0915690 ECX: dfbf2c84 EDX: 00000000
ESI: c2759f10 EDI: c014d9d8 EBP: c2759edc ESP: c2759e60
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010086
CR0: 80050033 CR2: b7c0e7a8 CR3: 08998000 CR4: 00000050
Call Trace:
? restore_fpregs_from_fpstate+0x48/0x50
? handle_mm_fault+0x537/0xd40
? exc_debug+0x40/0x40
math_error+0x46/0x110
exc_coprocessor_error+0x1a/0x30
handle_exception+0x14d/0x14d
EIP: restore_fpregs_from_fpstate+0x48/0x50
Code: 90 c0 21 c8 8b 0d 2c f2 90 c0 21 ca 0f ae 6b 40 5b 5d c3 8d b4 26 00 00 00 00 eb 0e cc cc cc 0f ae 4b 40 5b 5d c3 8d 74 26 00 <dd> 63 40 5b 5d c3 66 90 55 ba ff 1c 08 00 89 e5 31 c9 b8 80 e1 90
EAX: 00081cff EBX: c3f60e40 ECX: 00000000 EDX: 00000000
ESI: c3f60e00 EDI: 00000001 EBP: c2759f70 ESP: c2759f6c
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010046
? exc_debug+0x40/0x40
? exc_debug+0x40/0x40
? restore_fpregs_from_fpstate+0x48/0x50
switch_fpu_return+0x3f/0x70
ret_from_fork+0x1a9/0x200
ret_from_fork_asm+0x12/0x20
entry_INT80_32+0x10d/0x10d
EIP: 0xb7cd633e
Code: Unable to access opcode bytes at 0xb7cd6314.
EAX: 00000000 EBX: 01200011 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: b7c0e7a8 EBP: bfa651e8 ESP: bfa651c0
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246
---[ end trace 0000000000000000 ]---
BUG: kernel NULL pointer dereference, address: 00000012
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
*pde = 00000000
Oops: Oops: 0000 [#1] SMP
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G W 7.0.0-dirty #1 PREEMPT
BUG: unable to handle page fault for address: c08ecfae
#PF: supervisor read access in kernel mode
#PF: e

The test system rebooted at this point and I was lucky enough to have a
peek at the console then and cross-check it with the test harness output,
so I've narrowed the reproducer down now to `math/test-fenv'. OK, that
does seem correlated and also triggers with the test run by hand.

The last output from the test is:

Test: after fesetenv (FE_NOMASK_ENV) processes will abort
when feraiseexcept (FE_DIVBYZERO) is called.
Pass: Process received SIGFPE.
Test: after fesetenv (FE_DFL_ENV) processes will not abort
when feraiseexcept (FE_DIVBYZERO) is called.

so it's `feraiseexcept' for division by zero that ultimately triggers the
issue in the newly-cloned child as the FP context is installed.

I've added a debug call to double-check the hypothesis and retrieve the
values of CW and SW and it confirmed an active zero divide exception, but
also made the crash go away, with ex_handler_fprestore() then invoked over
a dozen of times through the execution of the program to completion, so
it's a heisenbug after all.

I'll see if I can chase it down later. I suspect ex_handler_fprestore()
shouldn't have triggered in the first place too, as it's not ptrace(2) or
the like that have set up the FP context like this.

NB I've linked this reply back to the original thread.

Maciej