Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops

From: Josh Poimboeuf
Date: Tue Apr 15 2025 - 10:40:05 EST

Next message: Aleksandr Nogikh: "Re: Latest clang versions fail to compile CONFIG_X86_X32_ABI=y"
Previous message: Caleb Sander Mateos: "Re: [PATCH] nvme: Removing deprecated strncpy()"
In reply to: Peter Zijlstra: "Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops"
Next in thread: Peter Zijlstra: "[PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Apr 15, 2025 at 09:44:21AM +0200, Peter Zijlstra wrote:
> On Mon, Apr 14, 2025 at 03:36:50PM -0700, Josh Poimboeuf wrote:
> > On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote:
> > > Since there is only a single fastop() function, convert the FASTOP
> > > stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
> > > thunks and all that jazz.
> > >
> > > Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
> > > which not all of them can trivially do (call depth tracing suffers
> > > here).
> > >
> > > Objtool strenuously complains about things, therefore fix up the
> > > various problems:
> > >
> > > - indirect call without a .rodata, fails to determine JUMP_TABLE,
> > > add an annotation for this.
> > > - fastop functions fall through, create an exception for this case
> > > - unreachable instruction after fastop_return, save/restore
> >
> > I think this breaks unwinding. Each of the individual fastops inherits
> > fastop()'s stack but the ORC doesn't reflect that.
>
> I'm not sure I understand. There is only the one location, and we
> simply save/restore the state around the one 'call'.

The problem isn't fastop() but rather the tiny functions it "calls".
Each of those is marked STT_FUNC so it gets its own ORC data saying the
return address is at RSP+8.

Changing from CALL_NOSPEC+RET to JMP_NOSPEC+JMP means the return address
isn't pushed before the branch. Thus they become part of fastop()
rather than separate functions. RSP+8 is only correct if it happens to
have not pushed anything to the stack before the indirect JMP.

The addresses aren't stored in an .rodata jump table so objtool doesn't
know the control flow. Even if we made them non-FUNC, objtool wouldn't
be able to transfer the stack state.

--
Josh

Next message: Aleksandr Nogikh: "Re: Latest clang versions fail to compile CONFIG_X86_X32_ABI=y"
Previous message: Caleb Sander Mateos: "Re: [PATCH] nvme: Removing deprecated strncpy()"
In reply to: Peter Zijlstra: "Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops"
Next in thread: Peter Zijlstra: "[PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]