[PATCH v3 2/3] x86/signal: Rewire the restart_block() syscall to have a constant nr

From: Andy Lutomirski
Date: Mon Jun 20 2016 - 20:06:20 EST


Suppose a 64-bit task A traces a 32-bit task B. B makes a syscall
that uses ERESTART_RESTARTBLOCK and gets a signal. A catches
syscall exit, snapshots B's regs, changes the regs, and resumes.
Then A restores the snapshot of B's regs.

A expects B to resume where it left off when snapshotted, which means
that B should execute sys_restart_block(). We have a big set of hacks
in place that make this work in some cases, but in this particular
case, the hacks fall apart. When A restores regs, because A is
64-bit, ptrace() will *not* set TS_I386_REGS_POKED. But, because B
isn't stopped in a syscall, TS_COMPAT won't be set either. As a
result, the current code will set nr=219 (the 64-bit
__NR_restart_syscall) and then promptly invoke madvise() (the 32-bit
syscall with nr==219).

This patch fixes the bug and simplifies the code simultaneous by
simply wiring up nr==380 to refer to sys_restart_block() in all cases.

Cc: Pedro Alves <palves@xxxxxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Kees Cook <keescook@xxxxxxxxxxxx>
Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
---
arch/x86/entry/syscalls/syscall_32.tbl | 2 ++
arch/x86/entry/syscalls/syscall_64.tbl | 3 +++
arch/x86/kernel/signal.c | 34 ++++++++--------------------------
3 files changed, 13 insertions(+), 26 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 4cddd17153fb..9fc9272d0c41 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -386,3 +386,5 @@
377 i386 copy_file_range sys_copy_file_range
378 i386 preadv2 sys_preadv2 compat_sys_preadv2
379 i386 pwritev2 sys_pwritev2 compat_sys_pwritev2
+# Same as restart_syscall but with the same nr for all ABIs.
+380 i386 restart_syscall2 sys_restart_syscall
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 555263e385c9..f074952eaad8 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -336,6 +336,9 @@
327 64 preadv2 sys_preadv2
328 64 pwritev2 sys_pwritev2

+# Same as restart_syscall but with the same nr for all ABIs.
+380 common restart_syscall2 sys_restart_syscall
+
#
# x32-specific system call numbers start at 512 to avoid cache impact
# for native 64-bit operation.
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 6b952e1d8db8..b9f5133867f3 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -761,35 +761,17 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
static inline unsigned long get_nr_restart_syscall(const struct pt_regs *regs)
{
/*
- * This function is fundamentally broken as currently
- * implemented.
- *
- * The idea is that we want to trigger a call to the
- * restart_block() syscall and that we want in_ia32_syscall(),
- * in_x32_syscall(), etc. to match whatever they were in the
- * syscall being restarted. We assume that the syscall
- * instruction at (regs->ip - 2) matches whatever syscall
- * instruction we used to enter in the first place.
- *
- * The problem is that we can get here when ptrace pokes
- * syscall-like values into regs even if we're not in a syscall
- * at all.
- *
- * For now, we maintain historical behavior and guess based on
- * stored state. We could do better by saving the actual
- * syscall arch in restart_block or (with caveats on x32) by
- * checking if regs->ip points to 'int $0x80'. The current
- * behavior is incorrect if a tracer has a different bitness
- * than the tracee.
+ * We can't reliably distinguish between restarting an i386
+ * syscall and an x86_64 syscall without decoding the
+ * instruction at regs->ip -= 2. Rather than trying, restart
+ * using __NR_restart_syscall2, which works regardless of ABI.
+ * We still want to preserve the x32 state, but we can do that
+ * directly.
*/
-#ifdef CONFIG_IA32_EMULATION
- if (current_thread_info()->status & (TS_COMPAT|TS_I386_REGS_POKED))
- return __NR_ia32_restart_syscall;
-#endif
#ifdef CONFIG_X86_X32_ABI
- return __NR_restart_syscall | (regs->orig_ax & __X32_SYSCALL_BIT);
+ return __NR_restart_syscall2 | (regs->orig_ax & __X32_SYSCALL_BIT);
#else
- return __NR_restart_syscall;
+ return __NR_restart_syscall2;
#endif
}

--
2.5.5