Re: [kernel-hardening] [RFC PATCH 1/1] seccomp: provide information about the previous syscall

From: Jann Horn
Date: Fri Jan 22 2016 - 05:48:43 EST


On Fri, Jan 22, 2016 at 03:30:00PM +0900, Daniel Sangorrin wrote:
> This patch allows applications to restrict the order in which
> its system calls may be requested. In order to do that, we
> provide seccomp-BPF scripts with information about the
> previous system call requested.
>
> An example use case consists of detecting (and stopping) return
> oriented attacks that disturb the normal execution flow of
> a user program.


The intent here is to mitigate attacks in which an attacker has
e.g. a function pointer overwrite without a high degree of stack
control or the ability to perform a stack pivot, correct? So that
e.g. a one-gadget system() call won't succeed?

Do you have data on how effective this protection is using just
the previous system call number?

I think that for example, the "magic ROP gadget" in glibc that
can be used given just a single pointer overwrite and stdin
control (https://gist.github.com/zachriggle/ca24daf4e8be953a3f96),
which (as far as I can tell) is in the middle of the system()
implementation, could be used as long as a transition to one of
the following syscalls is allowed:

- rt_sigaction
- rt_sigprocmask
- clone
- execve

I'm not sure how many interesting syscalls typically transition
to that, perhaps you can comment on that?

However, when exploiting network servers, this magic gadget
won't help much - an attacker would probably have to either
call into an interesting function in the application or use
ROP. In the latter case, this protection won't help much -
especially considering that most syscalls just return
-EFAULT / -EINVAL when you supply nonsense arguments, ROPping
through a "pop rax;ret" gadget and a "syscall;ret" gadget
should make it fairly easy to bypass the protection. There
are a bunch of occurences of both gadgets in Debian's libc
(and these are just the trivial ones):

$ hexdump -C /lib/x86_64-linux-gnu/libc-2.19.so | grep '58 c3'
000382e0 00 00 48 8b 00 5b 8b 40 58 c3 48 8d 05 4f 8a 36 |..H..[.@xxxxxxxx|
000383b0 58 c3 48 8d 05 87 89 36 00 48 39 c3 74 0e 48 89 |X.H....6.H9.t.H.|
00038450 40 58 c3 48 8d 05 e6 88 36 00 48 39 c3 74 0e 48 |@X.H....6.H9.t.H|
000d9a00 48 89 44 24 18 e8 56 ff ff ff 48 83 c4 58 c3 90 |H.D$..V...H..X..|
000e51d0 c3 0f 1f 80 00 00 00 00 48 8b 40 58 c3 0f 1f 00 |........H.@xxxxx|
000ea2f0 48 83 3d 58 c3 2b 00 00 48 8b 1d 69 8b 2b 00 64 |H.=X.+..H..i.+.d|
00160520 48 c3 fa ff 58 c3 fa ff 68 c3 fa ff 80 c3 fa ff |H...X...h.......|
00171470 58 c3 f8 ff 84 60 02 00 74 c3 f8 ff 94 62 02 00 |X....`..t....b..|
$ hexdump -C /lib/x86_64-linux-gnu/libc-2.19.so | grep '0f 05 c3'
000b85b0 b8 6e 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.n..............|
000b85c0 b8 66 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.f..............|
000b85d0 b8 6b 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.k..............|
000b85e0 b8 68 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.h..............|
000b85f0 b8 6c 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.l..............|
000b87f0 b8 6f 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.o..............|
000d9260 b8 5f 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |._..............|
000e6400 b8 e4 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |................|
000fff60 48 63 3f b8 03 00 00 00 0f 05 c3 0f 1f 44 00 00 |Hc?..........D..|

So an attacker would craft the stack like this:
[pop rax;ret address]
[first syscall for transition]
[syscall;ret address]
[pop rax;ret address]
[second syscall for transition]
[syscall;ret address]
[...]
[normal ROP for whatever the attacker wants to do]


Maybe someone who knows a bit more about binary exploiting
can comment on this, especially how likely it is that a
manipulation of a network service's program flow is successful
in the presence of full ASLR and so on without ROP.


Also, there is a potential functional issue: What about signal handlers?
Signal handlers will require transitions from all syscalls to any syscall
that occurs at the start of a signal handler to be allowed as far as I
can tell.


> @@ -443,6 +448,11 @@ static long seccomp_attach_filter(unsigned int flags,
> return ret;
> }
>
> + /* Initialize the prev_nr field only once */
> + if (current->seccomp.filter == NULL)
> + current->seccomp.prev_nr =
> + syscall_get_nr(current, task_pt_regs(current));
> +
> /*
> * If there is an existing filter, make it the prev and don't drop its
> * task reference.

What about SECCOMP_FILTER_FLAG_TSYNC? When a thread is transitioned from
SECCOMP_MODE_DISABLED to SECCOMP_MODE_FILTER by another thread, its initial
prev_nr will be 0, which would e.g. appear to be the read() syscall on
x86_64, right?

Attachment: signature.asc
Description: Digital signature