Re: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro

From: Denys Vlasenko
Date: Mon Feb 12 2018 - 08:44:00 EST

Next message: Zhu Lingshan: "[PATCH v2] .gitignore: ignore ASN.1 auto generated files"
Previous message: Igor Stoppa: "Re: [PATCH 4/6] Protectable Memory"
In reply to: David Laight: "RE: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro"
Next in thread: Linus Torvalds: "Re: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 02/12/2018 02:36 PM, David Laight wrote:

From: Denys Vlasenko

Sent: 12 February 2018 13:29

...

x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro

Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before
SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS.
This macro uses PUSH instead of MOV and should therefore be faster, at
least on newer CPUs.

...

Link: http://lkml.kernel.org/r/20180211104949.12992-5-linux@xxxxxxxxxxxxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
arch/x86/entry/calling.h | 36 ++++++++++++++++++++++++++++++++++++
arch/x86/entry/entry_64.S | 6 ++----
2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index a05cbb8..57b1b87 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -137,6 +137,42 @@ For 32-bit we have the following conventions - kernel is built with
UNWIND_HINT_REGS offset=\offset
.endm

+ .macro PUSH_AND_CLEAR_REGS
+ /*
+ * Push registers and sanitize registers of values that a
+ * speculation attack might otherwise want to exploit. The
+ * lower registers are likely clobbered well before they
+ * could be put to use in a speculative execution gadget.
+ * Interleave XOR with PUSH for better uop scheduling:
+ */
+ pushq %rdi /* pt_regs->di */
+ pushq %rsi /* pt_regs->si */
+ pushq %rdx /* pt_regs->dx */
+ pushq %rcx /* pt_regs->cx */
+ pushq %rax /* pt_regs->ax */
+ pushq %r8 /* pt_regs->r8 */
+ xorq %r8, %r8 /* nospec r8 */

xorq's are slower than xorl's on Silvermont/Knights Landing.
I propose using xorl instead.

Does using movq to copy the first zero to the other registers make
the code any faster?

ISTR mov reg-reg is often implemented as a register rename rather than an
alu operation.

xorl is implemented in register rename as well. Just, for some reason,
xorq did not get the same treatment on those CPUs.

Next message: Zhu Lingshan: "[PATCH v2] .gitignore: ignore ASN.1 auto generated files"
Previous message: Igor Stoppa: "Re: [PATCH 4/6] Protectable Memory"
In reply to: David Laight: "RE: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro"
Next in thread: Linus Torvalds: "Re: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]