Re: [PATCH 2/3] vsyscall_64: allow SECCOMP_RET_TRACErs to skip

From: Will Drewry
Date: Sat Jul 14 2012 - 12:56:02 EST


On Sat, Jul 14, 2012 at 11:21 AM, Andrew Lutomirski <luto@xxxxxxx> wrote:
> On Sat, Jul 14, 2012 at 9:15 AM, Andrew Lutomirski <luto@xxxxxxx> wrote:
>> On Sat, Jul 14, 2012 at 8:57 AM, Will Drewry <wad@xxxxxxxxxxxx> wrote:
>>> On Sat, Jul 14, 2012 at 10:50 AM, Will Drewry <wad@xxxxxxxxxxxx> wrote:
>>>> On Sat, Jul 14, 2012 at 10:44 AM, Andrew Lutomirski <luto@xxxxxxx> wrote:
>>>>> I think I'd prefer if changing to something other than whatever value is
>>>>> used to cancel the syscall resulted in a crash rather than just being
>>>>> ignored.
>>>>
>>>> I was trying to keep as much seccomp-ptrace behavior intact rather
>>>> than making it terminal in this special case. Is there a reason why
>>>> it'd make more sense to crash?
>>>
>>> Unless you meant something the tracer could catch? That may make
>>> sense, but they could also use singlestep or whatever else to get
>>> similar behavior. But maybe I'm missing the bigger picture!
>>
>> I think it would be nice to not introduce any special behavior that
>> things might rely on if we do this better in the future. Similarly,
>> for almost all purposes, a tracer could change gettimeofday to write,
>> but there would be a silent behavior change if gettimeofday were
>> entered via vsyscall.
>>
>> What's the standard way of skipping a syscall? sigreturn? (I don't
>> know off the top of my head what sigreturn does.) sys_ni_syscall?

Most implementations I've seen just set orig_rax to -1, but in
practice any system call that is invalid will result in the system
call being skipped in the normal syscall path. So I could tweak it to
check <0 > __NR_syscalls and add a sigsys generator if it is valid.

Do you think that'd be the most correct move? (As you say below, the
impact of _any_ changes here are very minor :)

>> I'd be all for making those continue to work but making anything that
>> can't be emulated correctly do something sufficiently unpleasant that
>> people won't do it. Is there a syscall that does nothing at all?
>
> Here's my suggestion for now: any attempt by the seccomp filter to do
> anything other than executing the syscall unchanged, skipping it
> entirely, or killing results in a trap or sigsys with the syscall
> number unchanged.
>
> In the RET_TRACE case, the SIGSYS restarts at the mov nr,%rax
> instruction. The tracer can deal with it accordingly (via checking
> the rip) -- if the tracer changes rax, it'll get changed right back by
> the emulated mov nr,%rax. If this results in an infinite loop, so be
> it.
>
> In the RET_TRAP case, do the same thing but via the special ptrace
> event instead of SIGSYS.

You've lost me here. The flow of a RET_TRAP is:
- enter vsyscall at 0xffffffffff600[]00
- run seccomp
- get ret_trap
- queue up a synchronous trap
- skip the vsyscall
- return

Even if the trap were to interrupt the current vsyscall trap handler,
the skip value would still be -1 and the syscall would still get
skipped without touching sp or ip.

Traps cannot provide kernel-side arbitration of allowed syscalls, they
can just emulate them in userspace before "returning" to the original
callsite. In this case, vsyscall makes returning there harder and the
quirk is to return to the prior address if the ip is in vsyscall's
range.

> This way, the special rip & ~0x0c00 == 0xffffffffff600000 handling
> becomes part of the ABI, but it looks like the mov instruction
> trapped, which it more or less did. If we want RET_TRAP to be able to
> skip the syscall, let it happen the normal way.

I *think* trap is ok?

> This stuff probably barely matters. For example, ptrace currently
> doesn't work right when vsyscalls are being emulated. No one appears
> to care.

Agreed :) I don't mind making tweaks to get it right, but this only
matters to users that want to:
- use seccomp filter
- with ptrace (or trap with resumption and not sigreturn)
- of time, gettimeofday, and getcpu
since they will then have to include quirk management _just_ in case
their code is linked against something using vsyscall and
vsyscall=emulate is in effect!

thanks!
will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/