Re: [PATCH 1/5] random: fix crng_ready() test

From: Theodore Y. Ts'o
Date: Thu May 17 2018 - 15:57:17 EST


On Wed, May 16, 2018 at 05:07:08PM -0700, Srivatsa S. Bhat wrote:
>
> On a Photon OS VM running on VMware ESXi, this patch causes a boot speed
> regression of 5 minutes :-( [ The VM doesn't have haveged or rng-tools
> (rngd) installed. ]
>
> [ 1.420246] EXT4-fs (sda2): re-mounted. Opts: barrier,noacl,data=ordered
> [ 1.469722] tsc: Refined TSC clocksource calibration: 1900.002 MHz
> [ 1.470707] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x36c65c1a9e1, max_idle_ns: 881590695311 ns
> [ 1.474249] clocksource: Switched to clocksource tsc
> [ 1.584427] systemd-journald[216]: Received request to flush runtime journal from PID 1
> [ 346.620718] random: crng init done
>
> Interestingly, the boot delay is exacerbated on VMs with large amounts
> of RAM. For example, the delay is not so noticeable (< 30 seconds) on a
> VM with 2GB memory, but goes up to 5 minutes on an 8GB VM.
>
> Also, cloud-init-local.service seems to be the one blocking for entropy
> here.

So the first thing I'd ask you to investigate is what the heck
cloud-init-local.service is doing, and why it really needs
cryptographic grade random numbers?

> It would be great if this CVE can be fixed somehow without causing boot speed
> to spuike from ~20 seconds to 5 minutes, as that makes the system pretty much
> unusable. I can workaround this by installing haveged, but ideally an in-kernel
> fix would be better. If you need any other info about my setup or if you have
> a patch that I can test, please let me know!

So the question is why is strong random numbers needed by
cloud-init-local, and how much do you trust either haveged and/or
RDRAND (which is what you will be depending upon if you install
rng-tools). If you believe that Intel and/or the NSA hasn't
backdoored RDRAND[1], or you believe that because Intel processor's
internal cache architecture isn't publically documented, and it's
Soooooooo complicated that no one can figure it out (which is what you
will be depending upon if you if you choose haveged), be my guest. I
personally consider the latter to be "security via obscu7rity", but
others have different opinions.

[1] As an aside, read the best paper from the 37th IEEE Symposium on
Security and Privacy and weep:

https://www.computerworld.com/article/3079417/security/researchers-built-devious-undetectable-hardware-level-backdoor-in-computer-chips.html

If it turns out that the security if your VM is critically dependent
on what cloud-init-local.service is doing, and you don't like making
those assumptions, then you may need to ask VMWare ESXi to make
virtio-rng available to guest OS's, and then make Photon OS depend on
a secure RNG available to the host OS.

Best regards,

- Ted