[PATCH] x86/fpu: don't let PTRACE_SETREGSET set xcomp_bv

From: Eric Biggers
Date: Fri Sep 08 2017 - 12:53:16 EST


From: Eric Biggers <ebiggers@xxxxxxxxxx>

On x86, userspace can use ptrace(PTRACE_SETREGSET, pid, NT_X86_XSTATE,
&iov) to set a task's extended state registers. Registers can be set to
any value, but the kernel assumes that the xregs_state itself remains
valid in the sense that the CPU can restore it.

However, in the case where the kernel is using the non-compacted
extended state format (which it does whenever the XSAVES instruction is
unavailable), userspace could also set the xcomp_bv field of the
xstate_header to any value. But all bits in that field are reserved in
the non-compacted case, so when switching to a task with nonzero
xcomp_bv, the XRSTOR instruction failed with a #GP fault. This caused
the WARN_ON_FPU(err) in copy_kernel_to_xregs() to be hit and, since the
error is otherwise ignored, could also leak registers from the task
previously executing on the CPU.

Fix the bug by ignoring the user-specified value of xcomp_bv in the
non-compacted case and instead always setting it to 0.

Note: we used to clear all xstate registers if XRSTOR failed, but this
was removed by commit 9ccc27a5d297 ("x86/fpu: Remove error return values
from copy_kernel_to_*regs() functions"). It perhaps should be added
back, to prevent this class of bug from causing an information leak.

This bug was found by syzkaller, which hit the above-mentioned
WARN_ON_FPU():

WARNING: CPU: 1 PID: 0 at ./arch/x86/include/asm/fpu/internal.h:373 __switch_to+0x5b5/0x5d0
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0 #453
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff9ba2bc8e42c0 task.stack: ffffa78cc036c000
RIP: 0010:__switch_to+0x5b5/0x5d0
RSP: 0000:ffffa78cc08bbb88 EFLAGS: 00010082
RAX: 00000000fffffffe RBX: ffff9ba2b8bf2180 RCX: 00000000c0000100
RDX: 00000000ffffffff RSI: 000000005cb10700 RDI: ffff9ba2b8bf36c0
RBP: ffffa78cc08bbbd0 R08: 00000000929fdf46 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ba2bc8e42c0
R13: 0000000000000000 R14: ffff9ba2b8bf3680 R15: ffff9ba2bf5d7b40
FS: 00007f7e5cb10700(0000) GS:ffff9ba2bf400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004005cc CR3: 0000000079fd5000 CR4: 00000000001406e0
Call Trace:
Code: 84 00 00 00 00 00 e9 11 fd ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 e7 fa ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 c2 fa ff ff <0f> ff 66 0f 1f 84 00 00 00 00 00 e9 d4 fc ff ff 66 66 2e 0f 1f

The following C program reproduces the bug quite reliably. The expected
behavior is that the program spin forever with no output. However, on a
buggy kernel running on a processor with the "xsave" feature but without
the "xsaves" feature (e.g. Sandy Bridge through Broadwell for Intel),
within a second or two the program reports that the xmm registers were
corrupted, i.e. were not restored correctly. With
CONFIG_X86_DEBUG_FPU=y it also hits the above kernel warning.

#define _GNU_SOURCE
#include <stdbool.h>
#include <inttypes.h>
#include <linux/elf.h>
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/uio.h>
#include <sys/wait.h>
#include <unistd.h>

int main(void)
{
int pid = fork();
uint64_t xstate[512];
struct iovec iov = { .iov_base = xstate, .iov_len = sizeof(xstate) };

if (pid == 0) {
bool tracee = true;
for (int i = 0; i < sysconf(_SC_NPROCESSORS_ONLN) && tracee; i++)
tracee = (fork() != 0);
uint32_t xmm0[4] = { [0 ... 3] = tracee ? 0x12345678 : 0xDEADBEEF };
asm volatile(" movdqu %0, %%xmm0\n"
" mov %0, %%rbx\n"
"1: movdqu %%xmm0, %0\n"
" mov %0, %%rax\n"
" cmp %%rax, %%rbx\n"
" je 1b\n"
: "+m" (xmm0) : : "rax", "rbx", "xmm0");
printf("BUG: xmm registers corrupted! tracee=%d, xmm0=%08X%08X%08X%08X\n",
tracee, xmm0[0], xmm0[1], xmm0[2], xmm0[3]);
} else {
usleep(100000);
ptrace(PTRACE_ATTACH, pid, 0, 0);
wait(NULL);
ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, &iov);
xstate[65] = -1;
ptrace(PTRACE_SETREGSET, pid, NT_X86_XSTATE, &iov);
ptrace(PTRACE_CONT, pid, 0, 0);
wait(NULL);
}
return 1;
}

Fixes: 0b29643a5843 ("x86/xsaves: Change compacted format xsave area header")
Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
Cc: Fenghua Yu <fenghua.yu@xxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Kevin Hao <haokexin@xxxxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
Cc: Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx> [v3.17+]
Signed-off-by: Eric Biggers <ebiggers@xxxxxxxxxx>
---
arch/x86/kernel/fpu/regset.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index b188b16841e3..718b791bc037 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -131,11 +131,15 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,

fpu__activate_fpstate_write(fpu);

- if (boot_cpu_has(X86_FEATURE_XSAVES))
+ if (boot_cpu_has(X86_FEATURE_XSAVES)) {
ret = copyin_to_xsaves(kbuf, ubuf, xsave);
- else
+ } else {
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, xsave, 0, -1);

+ /* xcomp_bv must be zero when using uncompacted format */
+ xsave->header.xcomp_bv = 0;
+ }
+
/*
* In case of failure, mark all states as init:
*/
--
2.14.1.581.gf28d330327-goog