RE: [RFC PATCH] x86/entry/64: randomize kernel stack offset upon syscall

From: Reshetova, Elena
Date: Fri Mar 29 2019 - 03:52:29 EST


> On Thu, Mar 28, 2019 at 9:29 AM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> > Doesnât this just leak some of the canary to user code through side channels?
>
> Erf, yes, good point. Let's just use prandom and be done with it.

And here I have some numbers on this. Actually prandom turned out to be pretty
fast, even when called every syscall. See the numbers below:

1) lmbench: ./lat_syscall -N 1000000 null
base: Simple syscall: 0.1774 microseconds
random_offset (prandom_u32() every syscall): Simple syscall: 0.1822 microseconds
random_offset (prandom_u32() every 4th syscall): Simple syscall: 0.1844 microseconds

2) Andy's tests, misc-tests: ./timing_test_64 10M sys_enosys
base: 10000000 loops in 1.62224s = 162.22 nsec / loop
random_offset (prandom_u32() every syscall): 10000000 loops in 1.64660s = 166.26 nsec / loop
random_offset (prandom_u32() every 4th syscall): 10000000 loops in 3.51315s = 169.30 nsec / loop

The second case is when prandom is called only once in 4 syscalls and unused random
bits are preserved in a per-cpu buffer. As you can see it is actually slower (modulo my maybe not
so optimized code in prandom, see below) vs. calling it every time, so I would vote for actually calling it every time and saving
on the hassle and also avoid additional code in prandom.

And below is what I was calling instead of prandom_u32() to preserve random bits
(net_rand_state_buffer is a new per-cpu buffer I added to save random bits):
And I didn't include the check for bytes >= sizeof(u32) since this was
just poc to test the base speed, but for generic case it would be needed.

+void prandom_bytes_preserve(void *buf, size_t bytes)
+{
+ u32 *buffer = &get_cpu_var(net_rand_state_buffer);
+ u8 *ptr = buf;
+
+ if (!(*buffer)) {
+ struct rnd_state *state = &get_cpu_var(net_rand_state);
+ if (bytes > 0) {
+ *buffer = prandom_u32_state(state);
+ do {
+ *ptr++ = (u8) *buffer;
+ bytes--;
+ *buffer >>= BITS_PER_BYTE;
+ } while (bytes > 0);
+ }
+ put_cpu_var(net_rand_state);
+ put_cpu_var(net_rand_state_buffer);
+ } else {
+ if (bytes > 0) {
+ do {
+ *ptr++ = (u8) *buffer;
+ bytes--;
+ *buffer >>= BITS_PER_BYTE;
+ } while (bytes > 0);
+ }
+ put_cpu_var(net_rand_state_buffer);
+ }
+}

I will send the first version of patch (calling prandom_u32() every time)
shortly if anyone wants to double check performance implications.

Best Regards,
Elena.