RE: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall

From: Reshetova, Elena
Date: Mon May 06 2019 - 03:33:05 EST


> * Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> > Or we decide that calling get_random_bytes() is okay with IRQs off and
> > this all gets a bit simpler.
>
> BTW., before we go down this path any further, is the plan to bind this
> feature to a real CPU-RNG capability, i.e. to the RDRAND instruction,
> which excludes a significant group of x86 of CPUs?

I would not like to bind this to only CPUs that have RDRAND.
That's why I was looking into using kernel's CSRNG (we can also use it
as backup when rdrand is not available).

> Because calling tens of millions of system calls per second will deplete
> any non-CPU-RNG sources of entropy and will also starve all other users
> of random numbers, which might have a more legitimate need for
> randomness, such as the networking stack ...

This should not apply to the proper CSRNG. They of course also have a
limitation on the amount of bits they can produce safely (as any crypto
primitive), but this period is very big and within that it does not affect
any other user of this CSPRNG, otherwise all guarantees are broken.

> I.e. I'm really *super sceptical* of this whole plan, as currently
> formulated.
>
> If we bind it to RDRAND then we shouldn't be using the generic
> drivers/char/random.c pool *at all*, but just call the darn instruction
> directly. This is an x86 patch-set after all, right?

Yes, but my main issues with RDRAND (even if we focus strictly onx86) are:
- it is not available on older PCs
- its performance varies across CPUs that support it (and as I understood varies quite some)
The last one can actually give unpleasant surprises...

> Furthermore the following post suggests that RDRAND isn't a per CPU
> capability, but a core or socket level facility, depending on CPU make:
>
> https://stackoverflow.com/questions/10484164/what-is-the-latency-and-
> throughput-of-the-rdrand-instruction-on-ivy-bridge
>
> 8 gigabits/sec sounds good throughput in principle, if there's no
> scalability pathologies with that.
>
> It would also be nice to know whether RDRAND does buffering *internally*,
> in which case it might be better to buffer as little at the system call
> level as possible, to allow the hardware RNG buffer to rebuild between
> system calls.

I will try asking around about concrete details on RDRAND behavior.
I have various bits and pieces I have been told plus measurements I did, but things
don't quite add up..

>
> I.e. I'd suggest to retrieve randomness via a fixed number of RDRAND-r64
> calls (where '1' is a perfectly valid block size - it should be
> measured), which random bits are then used as-is for the ~6 bits of
> system call stack offset. (I'd even suggest 7 bits: that skips a full
> cache line almost for free and makes the fuzz actually meaningful: no
> spear attacker will take a 1/128, 0.8% chance to successfully attack a
> critical system.)
>
> Then those 64*N random bits get buffered and consumed in 5-7 bit chunk,
> in a super efficient fashion, possibly inlining the fast path, totally
> outside the flow of the drivers/char/random.c

I will ask around on what is the best way to use RDRAND for our purpose.

>
> Any non-CPU source of randomness for system calls and plans to add
> several extra function calls to every x86 system call is crazy talk I
> believe...

So, if we go the CPU randomness path, then what do we fall back to when
RNRAND is not available? Skip randomization altogether or backup to
CSRNG?

Best Regards,
Elena.