Re: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall

From: Ingo Molnar
Date: Tue Apr 16 2019 - 03:34:52 EST



* Reshetova, Elena <elena.reshetova@xxxxxxxxx> wrote:

> > 4)
> >
> > But before you tweak the patch, a more fundamental question:
> >
> > Does the stack offset have to be per *syscall execution* randomized?
> > Which threats does this protect against that a simpler per task syscall
> > random offset wouldn't protect against?
>
> We *really* need it per syscall. If you take a look on the recent stack attacks
> [1],[2],[3],[4], they all do some initial probing on syscalls first to discover stack addresses
> or leftover data on the stack (or pre-populate stack with some attacker-controlled data),
> and then in the following syscall execute the actual attack (leak data, use
> pre-populated data for execution, etc.). If the offset stays the same during
> task life time, it can be easily recovered during this initial probing phase, and
> then nothing changes for the attacker.
>
> [1] Kernel Exploitation Via Uninitialized Stack, 2011
> https://www.defcon.org/images/defcon-19/dc-19-presentations/Cook/DEFCON-19-Cook-Kernel-Exploitation.pdf
> [2] Stackjacking, 2011, https://jon.oberheide.org/files/stackjacking-infiltrate11.pdf
> [3] The Stack is Back, 2012, https://jon.oberheide.org/files/infiltrate12-thestackisback.pdf
> [4] Exploiting Recursion in the Linux Kernel, 2016,
> https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html

Yeah, so if there's an information leak from the kernel stack, don't we
now effectively store 5 PRNG bits there for every syscall, allowing the
systematic probing of the generic PRNG?

The kernel can execute millions of syscalls per second, I'm pretty sure
there's a statistical attack against:

* This is a maximally equidistributed combined Tausworthe generator
* based on code from GNU Scientific Library 1.5 (30 Jun 2004)
*
* lfsr113 version:
*
* x_n = (s1_n ^ s2_n ^ s3_n ^ s4_n)
*
* s1_{n+1} = (((s1_n & 4294967294) << 18) ^ (((s1_n << 6) ^ s1_n) >> 13))
* s2_{n+1} = (((s2_n & 4294967288) << 2) ^ (((s2_n << 2) ^ s2_n) >> 27))
* s3_{n+1} = (((s3_n & 4294967280) << 7) ^ (((s3_n << 13) ^ s3_n) >> 21))
* s4_{n+1} = (((s4_n & 4294967168) << 13) ^ (((s4_n << 3) ^ s4_n) >> 12))
*
* The period of this generator is about 2^113 (see erratum paper).

... which recovers the real PRNG state much faster than the ~60 seconds
seeding interval and allows the prediction of the next stack offset?

I.e. I don't see how kernel stack PRNG randomization protects against
information leaks from the kernel stack. By putting PRNG information into
the kernel stack for *every* system call we add a broad attack surface:
any obscure ioctl information leak can now be escalated into an attack
against the net_rand_state PRNG, right?

> No the above numbers are with CONFIG_PAGE_TABLE_ISOLATION=y for x86_64,
> I will test with CONFIG_PAGE_TABLE_ISOLATION turned off from now on
> also.

Thanks!

Ingo