[tip:x86/asm] x86/asm/entry: Create and use a ' TOP_OF_KERNEL_STACK_PADDING' macro

From: tip-bot for Andy Lutomirski
Date: Mon Mar 16 2015 - 08:09:24 EST


Commit-ID: 3bfef48e27863ebbd47d68a3f4a081d2cb5286aa
Gitweb: http://git.kernel.org/tip/3bfef48e27863ebbd47d68a3f4a081d2cb5286aa
Author: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
AuthorDate: Tue, 10 Mar 2015 11:05:58 -0700
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitDate: Mon, 16 Mar 2015 11:05:30 +0100

x86/asm/entry: Create and use a 'TOP_OF_KERNEL_STACK_PADDING' macro

x86_32, unlike x86_64, pads the top of the kernel stack, because the
hardware stack frame formats are variable in size.

Document this padding and give it a name.

This should make no change whatsoever to the compiled kernel
image. It also doesn't fix any of the current bugs in this area.

Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
Acked-by: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: H. Peter Anvin <hpa@xxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Link: http://lkml.kernel.org/r/02bf2f54b8dcb76a62a142b6dfe07d4ef7fc582e.1426009661.git.luto@xxxxxxxxxxxxxx
[ Fixed small details, such as a missed magic constant in entry_32.S pointed out by Denys Vlasenko. ]
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
arch/x86/include/asm/processor.h | 3 ++-
arch/x86/include/asm/thread_info.h | 27 +++++++++++++++++++++++++++
arch/x86/kernel/entry_32.S | 2 +-
3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index c77605d..554da61 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -848,7 +848,8 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
#define task_pt_regs(task) \
({ \
struct pt_regs *__regs__; \
- __regs__ = (struct pt_regs *)(KSTK_TOP(task_stack_page(task))-8); \
+ __regs__ = (struct pt_regs *)(KSTK_TOP(task_stack_page(task)) - \
+ TOP_OF_KERNEL_STACK_PADDING); \
__regs__ - 1; \
})

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 7740edd..ba115eb 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -13,6 +13,33 @@
#include <asm/types.h>

/*
+ * TOP_OF_KERNEL_STACK_PADDING is a number of unused bytes that we
+ * reserve at the top of the kernel stack. We do it because of a nasty
+ * 32-bit corner case. On x86_32, the hardware stack frame is
+ * variable-length. Except for vm86 mode, struct pt_regs assumes a
+ * maximum-length frame. If we enter from CPL 0, the top 8 bytes of
+ * pt_regs don't actually exist. Ordinarily this doesn't matter, but it
+ * does in at least one case:
+ *
+ * If we take an NMI early enough in SYSENTER, then we can end up with
+ * pt_regs that extends above sp0. On the way out, in the espfix code,
+ * we can read the saved SS value, but that value will be above sp0.
+ * Without this offset, that can result in a page fault. (We are
+ * careful that, in this case, the value we read doesn't matter.)
+ *
+ * In vm86 mode, the hardware frame is much longer still, but we neither
+ * access the extra members from NMI context, nor do we write such a
+ * frame at sp0 at all.
+ *
+ * x86_64 has a fixed-length stack frame.
+ */
+#ifdef CONFIG_X86_32
+# define TOP_OF_KERNEL_STACK_PADDING 8
+#else
+# define TOP_OF_KERNEL_STACK_PADDING 0
+#endif
+
+/*
* low level task data that entry.S needs immediate access to
* - this struct should fit entirely inside of one cache line
* - this struct shares the supervisor stack pages
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index e33ba51..4c8cc34 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -398,7 +398,7 @@ sysenter_past_esp:
* A tiny bit of offset fixup is necessary - 4*4 means the 4 words
* pushed above; +8 corresponds to copy_thread's esp0 setting.
*/
- pushl_cfi ((TI_sysenter_return)-THREAD_SIZE+8+4*4)(%esp)
+ pushl_cfi ((TI_sysenter_return)-THREAD_SIZE+TOP_OF_KERNEL_STACK_PADDING+4*4)(%esp)
CFI_REL_OFFSET eip, 0

pushl_cfi %eax
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/