RE: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall

From: Reshetova, Elena
Date: Fri Apr 26 2019 - 07:33:18 EST


> Hi,
>
> Sorry for the delay - Easter holidays + I was trying to arrange my brain around
> proposed options.
> Here what I think our options are with regards to the source of randomness:
>
> 1) rdtsc or variations based on it (David proposed some CRC-based variants for
> example)
> 2) prandom-based options
> 3) some proper crypto (chacha8 for example seems to be the lightest out of existing
> options,
> and probably enough for our purpose, but looks like kernel has only chacha20)
> 4) rdrand or other HW-based crypto
>
> Option 4 was measured to be heavy for the purpose:
> base: Simple syscall: 0.1774 microseconds
> random_offset (rdtsc): Simple syscall: 0.1803 microseconds
> random_offset (rdrand): Simple syscall: 0.3702 microseconds
>
>
> Option 2 (even if we fork our own state(s), do it per-cpu, reseed, etc.) starts to look
> for me as the least desired.
> The existing generator's state, as people mentioned before, is trivially solvable given
> a very little amount of
> equations (syscalls in our case) you need to issue and offsets to leak.
> Even if we isolate the state/seed to just this purpose of stack randomization (and
> don't leak anything about the rest
> of the system or net prandom usage), it still probably makes the
> randomization more easily solvable than some constructs based on lower bits of
> rdtsc.
> In addition building on top of existing kernel LFSR would add more (probably not
> useful for any other purpose)
> code, a possible misconception that it can be used for "real security", etc. So, I
> would propose to abandon this idea.
>
> Option 3 we have to measure I guess, but if it is as heavy as rdrand, then this is also
> out.

Adding Eric and Herbert to continue discussion for the chacha part.
So, as a short summary I am trying to find out a fast (fast enough to be used per syscall
invocation) source of random bits with good enough security properties.
I started to look into chacha kernel implementation and while it seems that it is designed to
work with any number of rounds, it does not expose less than 12 rounds primitive.
I guess this is done for security sake, since 12 is probably the lowest bound we want people
to use for the purpose of encryption/decryption, but if we are to build an efficient RNG,
chacha8 probably is a good tradeoff between security and speed.

What are people's opinions/perceptions on this? Has it been considered before to create a
kernel RNG based on chacha?

Best Regards,
Elena.