Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

From: Josh Poimboeuf
Date: Tue Sep 19 2017 - 21:47:19 EST


On Tue, Sep 19, 2017 at 03:25:59PM -0700, Alexander Potapenko wrote:
> On Tue, Sep 19, 2017 at 2:55 PM, Alexander Potapenko <glider@xxxxxxxxxx> wrote:
> > On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> >> For inline asm statements which have a CALL instruction, we list the
> >> stack pointer as a constraint to convince GCC to ensure the frame
> >> pointer is set up first:
> >>
> >> static inline void foo()
> >> {
> >> register void *__sp asm(_ASM_SP);
> >> asm("call bar" : "+r" (__sp))
> >> }
> >>
> >> Unfortunately, that pattern causes clang to corrupt the stack pointer.
> >>
> >> There's actually an easier way to achieve the same goal in GCC, without
> >> causing trouble for clang. If we declare the stack pointer register
> >> variable as a global variable, and remove the constraint altogether,
> >> that convinces GCC to always set up the frame pointer before inserting
> >> *any* inline asm.
> >>
> >> It basically acts as if *every* inline asm statement has a CALL
> >> instruction. It's a bit overkill, but the performance impact should be
> >> negligible.
> >>
> >> Here are the vmlinux .text size differences with the following configs
> >> on GCC:
> >>
> >> - defconfig
> >> - defconfig without frame pointers
> >> - Fedora distro config
> >> - Fedora distro config without frame pointers
> >>
> >> defconfig defconfig-nofp distro distro-nofp
> >> before 9796300 9466764 9076191 8789745
> >> after 9796941 9462859 9076381 8785325
> >>
> >> With frame pointers, the text size increases slightly. Without frame
> >> pointers, the text size decreases, and a little more significantly.
> >>
> >> Reported-by: Matthias Kaehlcke <mka@xxxxxxxxxxxx>
> >> Signed-off-by: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
> >> ---
> >> arch/x86/include/asm/alternative.h | 3 +--
> >> arch/x86/include/asm/asm.h | 9 ++++++++
> >> arch/x86/include/asm/mshyperv.h | 27 ++++++++++--------------
> >> arch/x86/include/asm/paravirt_types.h | 14 ++++++------
> >> arch/x86/include/asm/preempt.h | 15 +++++--------
> >> arch/x86/include/asm/processor.h | 6 ++----
> >> arch/x86/include/asm/rwsem.h | 6 +++---
> >> arch/x86/include/asm/uaccess.h | 5 ++---
> >> arch/x86/include/asm/xen/hypercall.h | 5 ++---
> >> arch/x86/kvm/emulate.c | 3 +--
> >> arch/x86/kvm/vmx.c | 4 +---
> >> arch/x86/mm/fault.c | 3 +--
> >> tools/objtool/Documentation/stack-validation.txt | 20 +++++++++++++-----
> >> 13 files changed, 60 insertions(+), 60 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> >> index 1b020381ab38..7aeb1cef4204 100644
> >> --- a/arch/x86/include/asm/alternative.h
> >> +++ b/arch/x86/include/asm/alternative.h
> >> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void *start, void *end)
> >> #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2, \
> >> output, input...) \
> >> { \
> >> - register void *__sp asm(_ASM_SP); \
> >> asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
> >> "call %P[new2]", feature2) \
> >> - : output, "+r" (__sp) \
> >> + : output \
> >> : [old] "i" (oldfunc), [new1] "i" (newfunc1), \
> >> [new2] "i" (newfunc2), ## input); \
> >> }
> >> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
> >> index 676ee5807d86..ff8921d4615b 100644
> >> --- a/arch/x86/include/asm/asm.h
> >> +++ b/arch/x86/include/asm/asm.h
> >> @@ -132,4 +132,13 @@
> >> /* For C file, we already have NOKPROBE_SYMBOL macro */
> >> #endif
> >>
> >> +#ifndef __ASSEMBLY__
> >> +/*
> >> + * This variable declaration has the side effect of forcing GCC to always set
> >> + * up the frame pointer before inserting any inline asm. This is necessary
> >> + * because some inline asm statements have CALL instructions.
> >> + */
> >> +register unsigned int __sp_unused asm("esp");

> > Shouldn't this be "asm(_ASM_SP)"?

> Answering my own question, this can't be _ASM_SP because of the
> realmode code, which is built with -m16 whereas _ASM_SP is "rsp".
> The patch works fine with "esp" though - most certainly declaring a
> ESP variable is enough to reserve the whole RSP in an x86_64 build.

Right, clang failing to build the realmode code was exactly why I did
that.

--
Josh