Re: [PATCH 09/10] x86/fpu: Allow restoring signal frames with larger xstate_size
From: Andrei Vagin
Date: Tue Jun 30 2026 - 20:48:34 EST
On Tue, Jun 30, 2026 at 12:24 PM Chang S. Bae <chang.seok.bae@xxxxxxxxx> wrote:
>
> On 6/15/2026 12:37 PM, Andrei Vagin wrote:
> > The kernel previously enforced that the xstate_size in the signal frame
> > must not exceed the current task's fpstate->user_size. This prevents
> > restoring signal frames that were saved on another CPU (in case of
> > container/process migration) with a different (larger) set of enabled
> > xstate features, even if the features to be restored are compatible.
> >
> > Relax this restriction by removing the strict check against user_size.
> > The previous commit introduced infrastructure to calculate the actual
> > required size based on the intersection of requested and supported
> > features. We now rely on that validation and only require that the
> > provided xstate_size is sufficient for the active features.
>
> I appreciate the effort to document the contract, add regression tests,
> and tighten the validation logic along with the revert fix so far in
> this series.
>
> But I'm wondering this bit of relaxing the checker is really necessary
> at this point.
Hi Chang,
Thanks for reviewing this.
In our previous discussion, I explained why the state translation
approach doesn't solve the problem: in-flight signal frames on user
stacks cannot be translated. When a process is checkpointed while
handling a signal, there is an in-flight signal frame residing directly
on the thread's user stack. There is no way for userspace tools to
reliably discover arbitrary in-flight signal frames embedded in stack
memory.
We want to support cases where processes using only cluster-wide
available features can be migrated from newer to older CPUs. If we
migrate from a newer CPU (larger default `user_size`) to an older CPU
(smaller `user_size`), enforcing `xstate_size <= fpstate->user_size` in
the kernel unconditionally rejects valid signal frames.
An fpu translation mechanism already exists in CRIU to restore current
per-thread FPU states, making it possible to migrate workloads between
different CPUs even today. But there is always a risk that a process is
migrated at the wrong moment (while running inside a signal handler).
Without this change, failing the size check on that in-flight stack
frame can trigger a state corruption. This patch eliminates that risk.
>
> With APX, userspace can no longer assume that a higher XSTATE component
> number implies a higher offset within the XSAVE image. Going forward,
> migration software will likely need a more robust approach that
> interprets the layout and transforms the image when moving between
> machines with different layouts. With such translation, maybe further
> relaxing the kernel-side checker isn't that needed.
I think APX was designed to preserve backward compatibility cleanly. And
I don't think that we rely on the assumption that a higher XSTATE component
number implies a higher offset within the XSAVE image.
`xstate_calculate_size()` already finds the topmost feature by offset. The
only reason the kernel needs to know the required xstate size is to
correctly pre-fault the user memory buffer.
While APX reuses the MPX space in the xsave state, it introduces a new
feature bit to indicate the presence of its state, which is really what
matters. The actual register state that gets restored depends on
`task_xfeatures` and the header's `xstate_bv`. Any attempt by XRSTOR to
restore a header containing unsupported feature bits in xstate_bv
generates a GP fault. This cleanly traps in restore_fpregs_from_user()
and fails out, triggering a SIGSEGV.
I completely understand that some features may be deprecated in the
future, but I still believe that for non-deprecated features, component
offsets should be fixed across all CPUs within a vendor's family. If
this assumption is ever broken, even standard KVM live migration of
guest vCPUs would break.
Sorry if I missed something. Maybe you can give an example of when
this change would work against us?
Thanks,
Andrei