RE: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall

From: Reshetova, Elena
Date: Wed May 08 2019 - 07:20:02 EST


..
> > rdrand (calling every 8 syscalls): Simple syscall: 0.0795 microseconds
>
> You could try something like:
> u64 rand_val = cpu_var->syscall_rand
>
> while (unlikely(rand_val == 0))
> rand_val = rdrand64();
>
> stack_offset = rand_val & 0xff;
> rand_val >>= 6;
> if (likely(rand_val >= 4))
> cpu_var->syscall_rand = rand_val;
> else
> cpu_var->syscall_rand = rdrand64();
>
> return stack_offset;
>
> That gives you 10 system calls per rdrand instruction
> and mostly takes the latency out of line.

I have experimented more (including the version above) and here are
more stable numbers:

CONFIG_PAGE_TABLE_ISOLATION=y:

base: Simple syscall: 0.1761 microseconds
get_random_bytes (4096 bytes buffer): Simple syscall: 0.1793 microseconds
rdrand(every 10 syscalls):Simple syscall: 0.1905 microseconds
rdrand(every 8 syscalls): Simple syscall: 0.1980 microseconds

CONFIG_PAGE_TABLE_ISOLATION=n:

base: Simple syscall: 0.0510 microseconds
get_random_bytes(4096 bytes buffer): Simple syscall: 0.0597 microseconds
rdrand (every 10 syscalls): Simple syscall: 0.0719 microseconds
rdrand (every 8 syscalls): Simple syscall: 0.0783 microseconds

So, pure speed wise get_random_bytes() with 1 page per-cpu buffer wins.

Also, I haven't yet found any person with in-depth knowledge of generator
behind rdrand, but when you read public design docs, it does have indeed internal
buffering itself, so my understanding is that as soon as there is stuff available
in this internal buffer (shared between all CPUs), the rdrand instruction is fast,
but if buffer needs refilling, then it is slow.
However, you can only ask a register worth of randomness from it (can't ask
5 bits for example), so a strategy to ask one full 64 bits register and store outcome
in a per-cpu buffer seems reasonable. The only other way I can think to do this is to run
rdrand in a row multiple times in one syscall to fill a bigger buffer and then use bits from there,
I can try to measure this in case this is faster (I doubt).

Best Regards,
Elena.