Re: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall

From: Ingo Molnar
Date: Thu May 02 2019 - 11:09:21 EST



* Andy Lutomirski <luto@xxxxxxxxxx> wrote:

> Or we decide that calling get_random_bytes() is okay with IRQs off and
> this all gets a bit simpler.

BTW., before we go down this path any further, is the plan to bind this
feature to a real CPU-RNG capability, i.e. to the RDRAND instruction,
which excludes a significant group of x86 of CPUs?

Because calling tens of millions of system calls per second will deplete
any non-CPU-RNG sources of entropy and will also starve all other users
of random numbers, which might have a more legitimate need for
randomness, such as the networking stack ...

I.e. I'm really *super sceptical* of this whole plan, as currently
formulated.

If we bind it to RDRAND then we shouldn't be using the generic
drivers/char/random.c pool *at all*, but just call the darn instruction
directly. This is an x86 patch-set after all, right?

Furthermore the following post suggests that RDRAND isn't a per CPU
capability, but a core or socket level facility, depending on CPU make:

https://stackoverflow.com/questions/10484164/what-is-the-latency-and-throughput-of-the-rdrand-instruction-on-ivy-bridge

8 gigabits/sec sounds good throughput in principle, if there's no
scalability pathologies with that.

It would also be nice to know whether RDRAND does buffering *internally*,
in which case it might be better to buffer as little at the system call
level as possible, to allow the hardware RNG buffer to rebuild between
system calls.

I.e. I'd suggest to retrieve randomness via a fixed number of RDRAND-r64
calls (where '1' is a perfectly valid block size - it should be
measured), which random bits are then used as-is for the ~6 bits of
system call stack offset. (I'd even suggest 7 bits: that skips a full
cache line almost for free and makes the fuzz actually meaningful: no
spear attacker will take a 1/128, 0.8% chance to successfully attack a
critical system.)

Then those 64*N random bits get buffered and consumed in 5-7 bit chunk,
in a super efficient fashion, possibly inlining the fast path, totally
outside the flow of the drivers/char/random.c

Any non-CPU source of randomness for system calls and plans to add
several extra function calls to every x86 system call is crazy talk I
believe...

Thanks,

Ingo