Re: [PATCH v2 2/3] x86/cpu/intel: Simplify F00F bug notice using pr_notice_once()

From: Maciej W. Rozycki

Date: Tue May 26 2026 - 20:26:03 EST

On Tue, 26 May 2026, Richard Weinberger wrote:

> >> (now that I've looked at it again, I can see it's 6.13.0 as it's been a
> >> while, so maybe it's gone now in 7.x, hmm... will have to check.)
> >
> > Yep, still there:
> >
> > ------------[ cut here ]------------
> > Bad FPU state detected at restore_fpregs_from_fpstate+0x48/0x50, reinitializing
> > FPU registers.
> > WARNING: at fixup_exception+0x2a1/0x2c0, CPU#1: ld-linux.so.2/9621
> > CPU: 1 UID: 500 PID: 9621 Comm: ld-linux.so.2 Tainted: G W
> > 7.0.0-dirty #1 PREEMPT
> > Tainted: [W]=WARN
> > Hardware name: [...]
>
> Do you see this also in qemu?

No idea, I have no QEMU setup readily available for this target. It does
not appear to be related to things such as the cache subsystem, which QEMU
does not strive to emulate, so I'd expect this issue to trigger though.

> I can give your test a try on my dual Pentium system.

Actually here is about the simplest reduced reproducer. Link with -lm.

#define _GNU_SOURCE

#include <fenv.h>
#include <stdlib.h>
#include <unistd.h>

#include <sys/wait.h>

int main(void)
{
int status;
int pid;

if (!(pid = fork())) {
fesetenv(FE_NOMASK_ENV);
feraiseexcept(FE_DIVBYZERO);
exit(0);
}
waitpid (pid, &status, 0);
if (!(pid = fork())) {
fesetenv(FE_DFL_ENV);
feraiseexcept(FE_DIVBYZERO);
exit(0);
}
waitpid (pid, &status, 0);
return 0;
}

The primary cause is the leak of the CW and SW from the first child to
the second. It likely doesn't trigger the chain of events with CPUs that
support FXRSTOR, because that instruction rightfully doesn't raise numeric
exceptions, but I suspect the leak may still be there, just less likely to
get noticed. A carefully crafted user code might still get wrong results.

Then for FRSTOR chips I don't think ex_handler_fprestore() ought to call
fpu_reset_from_exception_fixup() at all -- instead it should just raise
SIGFPE with the FP context as it is and let the signal action handle it,
as it seems to me that it can always trigger if a task is pre-empted from
its FP context while an unmasked numeric exception is pending, and then
the context attempted to be reloaded, which will clearly break the app.

The third issue is the actual crash that follows, which appears random,
suggesting kernel data corruption. Actually I've just retried the test
case above to be sure, as I've modified the kernel since, and the box just
rebooted after the dump from ex_handler_fprestore(), no further output
produced to the console. It might be worth tracking down even if both
issues above have been fixed, most probably covering this issue.

Thanks for your interest as I may not have the cycles to chase it further
in the coming days.

Maciej