Re: PTRACE_SYSCALL && vsyscall (Was: arch_check_bp_in_kernelspace:fix the range check)

From: u3557
Date: Wed Dec 05 2012 - 08:14:27 EST


Dear Jan,

> x86 debug registers are already very scarce. Besides that userland
> applications know they have 4 of them available so it would also break
> them.

If a userland application wants to cheat, then it has no need to bypass
the debug registers: even if there were 4096 of them, covering the whole
vsyscall page, it could simply copy the whole vsyscall page somewhere
else, then run (or emulate) it, or look directly at the raw data within
the vsyscall page. The only way to overcome that would be to make the
vsyscall page either non-existent or unreadable.

Personally, allowing userland applications to read the vsyscall page
can't hurt me and my applications, but if someone else is concerned with
such malicious programs (does anyone?), if they need to construct the
strictest-of-strict jail, where jailed applications cannot glimpse
any information from the kernel they run on no matter how hard they try,
then they must at least make the vsyscall page unreadable, then rely
either on kernel emulation or a SIGSEGV (the later would be quite
sufficient for my own needs as a substitute for debug-registers,
but unfortunately not for the current version of "strace").

If, as I was told, it's too hard to remove the vsyscall page on a
per-process basis, then it's sufficient to make it unreadable on
context-switch.

My concern, however, is not with the bad guys, but with good honest
programs that would run incorrectly if allowed to call "time()" or
"gettimeofday()" unsupervised. No good program or library jumps into
the vsyscall page except into its 3 official entry points.

In summary, it should be decided:

If it is important enough for Linux to support paranoidically strict
jails, then full kernel emulation of PTRACE_SYSCALL (and PTRACE_SYSEMU)
is inescapable.

If, on the other hand, there is only a need to allow applications such as
mine and "strace"/"gdb" to trap system-calls that occur via the vsyscall
page, then in principle a variety of options are possible:

1. Allow setting the x86 hardware-debug registers into the vsyscall page.
2. Optional (per-process) removal of execute-permission from the vsyscall
page.
3. Optional (per-process) removal of both read and execute permissions
from the vsyscall page.
4. Optional (per-process) elimination of the vsyscall page altogether.
5. Kernel vsyscall emulation code to send some signal or event to traced
processes if the ptracer asked for it (using a new ptrace option).
6. Complete and transparent emulation of PTRACE_SYSCALL/PTRACE_SYSEMU.

Option #1 requires the least effort (a 2-line fix).
Option #6 requires the most effort.

Best Regards,
Amnon.

> On Sun, 02 Dec 2012 20:30:58 +0100, Oleg Nesterov wrote:
>> Yes, that is why I said this needs the new option.
>
> I do not mind new options although personally I do not find them
> meaningful
> for an already deprecated ABI compatibility-only issue.
>
>
>> If the tracer does PTRACE_SYSCALL the tracee reports syscall exit
>> _after_ gettimeofday/etc. The tracer can look at regs->orig_ax == -1
>> and detect that this is not syscall but vsyscall, it can look at
>> regs->ip then (not with the patch below).
>
> I believe applications just call PTRACE_SYSCALL twice, without checking
> orig_eax. At least strace and its TCB_INSYSCALL looks so.
>
>
> On Mon, 03 Dec 2012 00:54:58 +0100, u3557@xxxxxxxxxxxxxxxxxx wrote:
>> The beauty of using the x86 debug-registers,
>
> x86 debug registers are already very scarce. Besides that userland
> applications know they have 4 of them available so it would also break
> them.
>
>
> Regards,
> Jan
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/