Re: objtool: undefined stack state in folio_zero_user()

From: Peter Zijlstra

Date: Tue Jun 30 2026 - 13:45:39 EST

On Tue, Jun 30, 2026 at 04:14:35PM +0200, Alexander Potapenko wrote:
> > diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> > index 10b18cf9c360..53a67b322856 100644
> > --- a/tools/objtool/check.c
> > +++ b/tools/objtool/check.c
> > @@ -3149,8 +3149,25 @@ static int update_cfi_state(struct instruction *insn,
> > /* drap: mov disp(%rbp), %reg */
> > restore_reg(cfi, op->dest.reg);
> >
> > + } else if (op->src.reg == CFI_SP &&
> > + regs[CFI_SP].base == CFI_CFA &&
> > + op->src.offset == regs[CFI_SP].offset + cfi->stack_size) {
> > +
> > + /*
> > + * Clang RSP musical chains:
>
> s/chains/chairs if you're going to submit that ;)

:-)

> I am not sure we can do much on the compiler side here.
> KMSAN just heavily increases register pressure, and this is how the
> backend handles it.
> We can't even influence it from the middle-end where the instrumentation occurs.
> I remember Clang having more than one regallocator (we used to fall
> back to PBQP for some huge files when instrumenting Chrome), but
> switching to the non-default one will probably open a can of worms.

Something in that compiler is smoking very potent dope.

The code I have here has the form:

mov %rsp, %rcx
1: mov %rcx, %rsp
...
mov %rsp, 0x68(%rsp)
...
mov 0x68(%rsp), %rcx
test
je 1b
mov %rcx, %r12
...
mov %r12, %rcx
jmp 1b

Which is really really stupid, it spills the rsp value to the stack,
only to then load it into another register. Simply doing:

mov %rsp, %rcx
1: mov %rcx, %rsp
...
mov %rsp, %rcx
test
je 1b
mov %rcx, %r12
...
mov %r12, %rcx
jmp 1b

Would have made it so much better. But I'm not at all sure why it is
playing these rsp games to begin with; that code just doesn't make much
sense to me at all.

Gemini is suggesting it is:

The rsp manipulation occurs for two primary reasons:

- Strict Stack Alignment: Most Application Binary Interfaces (ABIs),
such as the System V AMD64 ABI, require the stack pointer (rsp) to
be 16-byte aligned (rsp (mod 16) = 0) immediately before a function
call. In functions with highly optimized local variables or
dynamically allocated stack memory using alloca(), the stack pointer
can easily drift. Clang temporarily aligns the stack by rounding it
down, but must stash the original rsp to restore it properly after
the tracking function completes.

- Dynamic Shadow/Origin Mapping: The function __msan_chain_origin
modifies origin metadata. Passing localized stack data or updating
origin chains can cause unpredictable frame offsets or displacement
inside the compiler's temporary spilling phase. Stashing the stack
pointer guarantees that the instrumentation code will not corrupt the
compiler-generated local variables if it relies on a consistent frame
pointer.

But if this is the former (alignment), then it already notices the stack
is properly aligned because there are no actual alignment instructions
issued, at which point it can then elide the restore too, but it
doesn't.

Gemini further elaborates:

The Call Site "Opaque Wrap"

When the KMSAN pass runs, it treats the injection of
__msan_chain_origin as a highly specific helper callback rather than a
standard C function call.

To prevent the compiler's backend from optimizing away or rearranging
the timing of this tracking, the instrumentation framework wraps the
call inside an execution envelope that dictates: "Save the CPU state,
call the hook, restore the CPU state."

Even if the backend later calculates that no alignment modification is
needed, the instruction slots for the save/restore actions have
already been allocated in the compiler's intermediate representation
(LLVM IR). Because x86-64 requires rsp tracking for non-leaf
functions, LLVM assigns a virtual register to stash rsp.

...

When the compiler’s register allocator reaches the instruction
sequence to save rsp, it discovers it has zero free registers
available to hold the value temporarily.

Its fallback mechanism for a lack of registers is to "spill" the value
to memory. Because there is no frame pointer (rbp), the only way it
knows how to address memory is relative to rsp. It emits the command to
copy rsp to [rsp + offset], unknowingly creating the circular logic
failure.

Here, that last thing, surely it can be taught to detect this logical
loop, storing rsp using rsp. Additionally, the moment it realizes it
doesn't need to re-align the stack (and it does), it can also kill the
restore.

Also, there is always a 'free' register to store RSP, it is called: RSP
:-)

Now, clearly I don't actually know much of LLVM internals, but this is
all quite insane.