[PATCH v2 00/10] Remove syscall instructions at fixed addresses

From: Andy Lutomirski
Date: Sun May 29 2011 - 23:51:40 EST


This series is really five different parts.



The first part (patch 1/10) is just a bugfix from the last vdso series.
The bug should be harmless but it's pretty dumb. This is almost
certainly 3.0 material.



The second part removes a bunch of syscall instructions in kernel space
at fixed addresses that user code can execute. This is not all that
well tested or inspected at this point.

Several are data that isn't marked NX. Patch 2/10 makes vvars NX and
5/10 makes the HPET NX.

The time() vsyscall contains an explicit syscall fallback. Patch 3/10
removes it.

The last one is the gettimeofday fallback. We need that, but it doesn't
have to be a real syscall. Patch 4/10 adds int 0xcc (callable only from
the vsyscall page) that implements the gettimeofday fallback and nothing
else.



The third part is a more aggressive cleanup of the vsyscall page. It
removes the code implementing the vsyscalls and replaces it with magic
int 0xcc incantations. These incantations are specifically designed so
that jumping into them at funny offsets will either work fine or
generate some kind of fault. Patch 8/10 is optional and might want to
be hidden away in CONFIG_EMBEDDED for awhile. This needs some more
testing in CONFIG_UNSAFE_VSYSCALLS=y mode and a lot of careful
inspection in CONFIG_UNSAFE_VSYSCALLS=n mode.

Patch 6/10 removes venosys. It's been broken (crashes) for a couple
years and it doesn't do anything particularly useful anyway.

Patch 7/10 fills the vsyscall page with 0xcc instead of 0x00. 0xcc is
an explicit trap

Patch 8/10 adds a config option to emulate the vsyscalls. The int 0xcc
incantation intentionally depends on the config option -- it is not ABI.



Patch 9/10 randomizes the int 0xcc incantation at bootup. It is pretty
much worthless for security (there are only three choices for the random
number and it's easy to figure out which one is in use) but it prevents
overly clever userspace programs from thinking that the incantation is
ABI. One instrumentation tool author offered to hard-code special
handling for int 0xcc; I want to discourage this approach.

Patch 10/10 adds some documentation for entry_64.S. A lot of the magic
in there is far from obvious.


Changes from v1:
- Patches 6-10 are new.
- The int 0xcc code is much prettier and has lots of bugs fixed.
- I've decided to let everyone compile turbostat on their own :)

Andy Lutomirski (10):
x86-64: Fix alignment of jiffies variable
x86-64: Give vvars their own page
x86-64: Remove kernel.vsyscall64 sysctl
x86-64: Replace vsyscall gettimeofday fallback with int 0xcc
x86-64: Map the HPET NX
x86-64: Remove vsyscall number 3 (venosys)
x86-64: Fill unused parts of the vsyscall page with 0xcc
x86-64: Emulate vsyscalls
x86-64: Randomize int 0xcc magic al values at boot
x86-64: Document some of entry_64.S

Documentation/x86/entry_64.txt | 95 +++++++++++
arch/x86/Kconfig | 17 ++
arch/x86/include/asm/fixmap.h | 1 +
arch/x86/include/asm/irq_vectors.h | 6 +-
arch/x86/include/asm/pgtable_types.h | 6 +-
arch/x86/include/asm/traps.h | 4 +
arch/x86/include/asm/vgtod.h | 1 -
arch/x86/include/asm/vsyscall.h | 6 +
arch/x86/include/asm/vvar.h | 24 ++--
arch/x86/kernel/Makefile | 3 +
arch/x86/kernel/entry_64.S | 4 +
arch/x86/kernel/hpet.c | 2 +-
arch/x86/kernel/traps.c | 4 +
arch/x86/kernel/vmlinux.lds.S | 47 +++---
arch/x86/kernel/vsyscall_64.c | 289 +++++++++++++++++++++++++++++-----
arch/x86/kernel/vsyscall_emu_64.S | 40 +++++
arch/x86/vdso/vclock_gettime.c | 55 +++----
17 files changed, 486 insertions(+), 118 deletions(-)
create mode 100644 Documentation/x86/entry_64.txt
create mode 100644 arch/x86/kernel/vsyscall_emu_64.S

--
1.7.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/