Re: [PATCH 2/3] x86: entry_64.S: use PUSH insns to build pt_regs on stack

From: Andy Lutomirski
Date: Wed Mar 18 2015 - 17:02:05 EST


On Wed, Mar 18, 2015 at 12:47 PM, Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
> We lose a number of large insns there:
>
> text data bss dec hex filename
> 9863 0 0 9863 2687 entry_64_before.o
> 9671 0 0 9671 25c7 entry_64.o
>
> What's more important, we convert two "MOVQ $imm,off(%rsp)" to "PUSH $imm"
> (the ones which fill pt_regs->cs,ss).
>
> Before this patch, placing them on fast path was slowing it down by two cycles:
> this form of MOV is very large, 12 bytes, and this probably reduces decode bandwidth
> to one insn per cycle when it meets them.
> Therefore they were living in FIXUP_TOP_OF_STACK instead (away from hot path).

Does that mean that this has zero performance impact, or is it
actually a speedup?

Also, I would add something to the description explaining that this
affects specifically the syscall path and that we're now populating
SS, CS, and RCX unconditionally and that we therefore don't need to
populate then in FIXUP_TOP_OF_STACK.

>
> "PUSH $imm" is a small 2-byte insn. Moving it to fast path does not slow it down,
> in my measurements.
>
> Signed-off-by: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
> CC: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> CC: Steven Rostedt <rostedt@xxxxxxxxxxx>
> CC: Ingo Molnar <mingo@xxxxxxxxxx>
> CC: Borislav Petkov <bp@xxxxxxxxx>
> CC: "H. Peter Anvin" <hpa@xxxxxxxxx>
> CC: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
> CC: Oleg Nesterov <oleg@xxxxxxxxxx>
> CC: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> CC: Alexei Starovoitov <ast@xxxxxxxxxxxx>
> CC: Will Drewry <wad@xxxxxxxxxxxx>
> CC: Kees Cook <keescook@xxxxxxxxxxxx>
> CC: x86@xxxxxxxxxx
> CC: linux-kernel@xxxxxxxxxxxxxxx
> ---
> arch/x86/kernel/entry_64.S | 46 +++++++++++++++++++++++++---------------------
> 1 file changed, 25 insertions(+), 21 deletions(-)
>
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index ecb68d8..4647c1d 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -126,11 +126,8 @@ ENDPROC(native_usergs_sysret64)
> * manipulation.
> */
> .macro FIXUP_TOP_OF_STACK tmp offset=0
> - movq $__USER_DS,SS+\offset(%rsp)
> - movq $__USER_CS,CS+\offset(%rsp)
> - movq RIP+\offset(%rsp),\tmp /* get rip */
> - movq \tmp,RCX+\offset(%rsp) /* copy it to rcx as sysret would do */
> - movq EFLAGS+\offset(%rsp),\tmp /* ditto for rflags->r11 */
> + /* copy flags to r11 as sysret would do */
> + movq EFLAGS+\offset(%rsp),\tmp
> movq \tmp,R11+\offset(%rsp)
> .endm
>
> @@ -236,27 +233,34 @@ ENTRY(system_call)
> */
> GLOBAL(system_call_after_swapgs)
>
> - /*
> - * We use 'rsp_scratch' as a scratch register, hence this block must execute
> - * atomically in the face of possible interrupt-driven task preemption,
> - * so we can enable interrupts only after we're done with using rsp_scratch:
> - */
> movq %rsp,PER_CPU_VAR(rsp_scratch)
> movq PER_CPU_VAR(kernel_stack),%rsp
> - ALLOC_PT_GPREGS_ON_STACK 6*8 /* 6*8: space for orig_ax and iret frame */
> - movq %rcx,RIP(%rsp)
> - movq PER_CPU_VAR(rsp_scratch),%rcx
> - movq %r11,EFLAGS(%rsp)
> - movq %rcx,RSP(%rsp)
> +
> + /* Construct struct pt_regs on stack */
> + pushq_cfi $__USER_DS /* pt_regs->ss */
> + pushq_cfi PER_CPU_VAR(rsp_scratch) /* pt_regs->sp */
> /*
> - * No need to follow this irqs off/on section - it's straight
> - * and short:
> + * We use 'rsp_scratch' as a scratch register, hence the block above
> + * must execute atomically in the face of possible interrupt-driven
> + * task preemption. We must enable interrupts only after we're done
> + * with using rsp_scratch:
> */

I think that you lost information here, but only because it was really
poorly written. If you mentally replace "follow" with "trace", then
it should make sense. That is, I would change your comment to:

+ * We use 'rsp_scratch' as a scratch register, hence the block above
+ * must execute atomically in the face of possible interrupt-driven
+ * task preemption. We must enable interrupts only after we're done
+ * with using rsp_scratch. Don't bother tracing the interrupt
re-enable,
+ * because the IRQ off code is very short and we didn't trace the fact
+ * that syscall disabled interrupts in the first place.


> ENABLE_INTERRUPTS(CLBR_NONE)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/