Re: [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return"

From: Chang S. Bae

Date: Wed Apr 29 2026 - 03:30:35 EST

On 4/28/2026 5:06 PM, Andrei Vagin wrote:

The reverted commit broke applications that construct signal frames in
userspace (such as CRIU and gVisor) if the frame's xstate size is
smaller than the kernel's fpstate->user_size.

In the extended state area, the sigframe embeds the hardware-defined XSAVE format. If CPU A and CPU B support different XSTATE features, the layout (size and offsets) differ across systems. However, within a system, the layout is invariant. Userspace can query CPUID to obtain the exact offset and sizes, which effectively defines the ABI.

On top of the XSAVE data, the kernel appends metadata (e.g. the xstate size and magic values). In particular fpstate->user_size is written by save_sw_bytes() at signal delivery. On sigreturn, the kernel validates this, which is a symmetric and straightforward check.

Because the format is hardware-defined, arbitrary size mismatches should not be allowed. The sigframe should match the CPU-defined XSAVE layout. So the change in fact strengthens the sanity check.

Furthermore, this introduces a critical issue for checkpoint/restore
tools like CRIU. If a process is checkpointed while inside a signal
handler, its stack contains a signal frame formatted according to the
source host's xstate capabilities. If that process is later restored on
a destination host with larger xstate capabilities (e.g., a newer CPU
with more features enabled, resulting in a larger fpstate->user_size),
the kernel will look for FP_XSTATE_MAGIC2 at the destination host's
larger user_size offset instead of the offset encoded in the frame's
fx_sw->xstate_size. This causes the magic2 check to fail, forcing
sigreturn to silently fall back to "FX-only" mode.

It seems that userspace could translate the XSAVE buffer from CPU A's format to CPU B's format during restore. If so, the frame can be consistent with the destination system without modifying fx_sw->xstate_size, and the kernel-side validation would continue to work as intended.

Thanks,
Chang