[tip:x86/mm] x86/mm/vsyscall: Consider vsyscall page part of user address space

From: tip-bot for Dave Hansen
Date: Tue Oct 09 2018 - 11:05:44 EST


Commit-ID: 3ae0ad92f53e0f05cf6ab781230b7902b88f73cd
Gitweb: https://git.kernel.org/tip/3ae0ad92f53e0f05cf6ab781230b7902b88f73cd
Author: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
AuthorDate: Fri, 28 Sep 2018 09:02:30 -0700
Committer: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
CommitDate: Tue, 9 Oct 2018 16:51:16 +0200

x86/mm/vsyscall: Consider vsyscall page part of user address space

The vsyscall page is weird. It is in what is traditionally part of
the kernel address space. But, it has user permissions and we handle
faults on it like we would on a user page: interrupts on.

Right now, we handle vsyscall emulation in the "bad_area" code, which
is used for both user-address-space and kernel-address-space faults.
Move the handling to the user-address-space code *only* and ensure we
get there by "excluding" the vsyscall page from the kernel address
space via a check in fault_in_kernel_space().

Since the fault_in_kernel_space() check is used on 32-bit, also add a
64-bit check to make it clear we only use this path on 64-bit. Also
move the unlikely() to be in is_vsyscall_vaddr() itself.

This helps clean up the kernel fault handling path by removing a case
that can happen in normal[1] operation. (Yeah, yeah, we can argue
about the vsyscall page being "normal" or not.) This also makes
sanity checks easier, like the "we never take pkey faults in the
kernel address space" check in the next patch.

Cc: x86@xxxxxxxxxx
Cc: Jann Horn <jannh@xxxxxxxxxx>
Cc: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Link: http://lkml.kernel.org/r/20180928160230.6E9336EE@xxxxxxxxxxxxxxxxxx
---
arch/x86/mm/fault.c | 38 +++++++++++++++++++++++++-------------
1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7a627ac3a0d2..7e0fa7e24168 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -846,7 +846,7 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
*/
static bool is_vsyscall_vaddr(unsigned long vaddr)
{
- return (vaddr & PAGE_MASK) == VSYSCALL_ADDR;
+ return unlikely((vaddr & PAGE_MASK) == VSYSCALL_ADDR);
}

static void
@@ -872,18 +872,6 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
if (is_errata100(regs, address))
return;

-#ifdef CONFIG_X86_64
- /*
- * Instruction fetch faults in the vsyscall page might need
- * emulation.
- */
- if (unlikely((error_code & X86_PF_INSTR) &&
- is_vsyscall_vaddr(address))) {
- if (emulate_vsyscall(regs, address))
- return;
- }
-#endif
-
/*
* To avoid leaking information about the kernel page table
* layout, pretend that user-mode accesses to kernel addresses
@@ -1192,6 +1180,14 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)

static int fault_in_kernel_space(unsigned long address)
{
+ /*
+ * On 64-bit systems, the vsyscall page is at an address above
+ * TASK_SIZE_MAX, but is not considered part of the kernel
+ * address space.
+ */
+ if (IS_ENABLED(CONFIG_X86_64) && is_vsyscall_vaddr(address))
+ return false;
+
return address >= TASK_SIZE_MAX;
}

@@ -1359,6 +1355,22 @@ void do_user_addr_fault(struct pt_regs *regs,
if (sw_error_code & X86_PF_INSTR)
flags |= FAULT_FLAG_INSTRUCTION;

+#ifdef CONFIG_X86_64
+ /*
+ * Instruction fetch faults in the vsyscall page might need
+ * emulation. The vsyscall page is at a high address
+ * (>PAGE_OFFSET), but is considered to be part of the user
+ * address space.
+ *
+ * The vsyscall page does not have a "real" VMA, so do this
+ * emulation before we go searching for VMAs.
+ */
+ if ((sw_error_code & X86_PF_INSTR) && is_vsyscall_vaddr(address)) {
+ if (emulate_vsyscall(regs, address))
+ return;
+ }
+#endif
+
/*
* Kernel-mode access to the user address space should only occur
* on well-defined single instructions listed in the exception