Re: [PATCH] seccomp: plug syscall-dodging ptrace hole

From: Andy Lutomirski
Date: Thu May 26 2016 - 22:11:08 EST

On Thu, May 26, 2016 at 2:04 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> One problem with seccomp was that ptrace could be used to change a
> syscall after seccomp filtering had completed. This was a well documented
> limitation, and it was recommended to block ptrace when defining a filter
> to avoid this problem. This can be quite a limitation for containers or
> other places where ptrace is desired even under seccomp filters.
> Since seccomp filtering has been split into pre-trace and trace phases
> (phase1 and phase2 respectively), it's possible to re-run phase1 seccomp
> after ptrace. This makes that change, and updates the test suite for
> both SECCOMP_RET_TRACE and PTRACE_SYSCALL manipulation.

I like fixing the hole, but I don't like this fix.

The two-phase seccomp mechanism is messy. I wrote it because it was a
huge speedup. Since then, I've made a ton of changes to the way that
x86 syscalls work, and there are two relevant effects: the slow path
is quite fast, and the phase-1-only path isn't really a win any more.

I suggest that we fix the by simplifying the code instead of making it
even more complicated. Let's back out the two-phase mechanism (but
keep the ability for arch code to supply seccomp_data) and then just
reorder it so that seccomp happens after ptrace. The result should be
considerably simpler. (We'll still have to answer the question of
what happens when a SECCOMP_RET_TRACE event changes the syscall, but
maybe the answer is to just let it through -- after all,
SECCOMP_RET_TRACE might be a request by a tracer to do its own
internal filtering.)