Re: [PATCH] random: Don't overwrite CRNG state in crng_initialize()

From: Theodore Ts'o
Date: Wed Feb 08 2017 - 23:19:44 EST

On Wed, Feb 08, 2017 at 08:31:26PM -0700, Alden Tondettar wrote:
> The new non-blocking system introduced in commit e192be9d9a30 ("random:
> replace non-blocking pool with a Chacha20-based CRNG") can under
> some circumstances report itself initialized while it still contains
> dangerously little entropy, as follows:
> Approximately every 64th call to add_interrupt_randomness(), the "fast"
> pool of interrupt-timing-based entropy is fed into one of two places. At
> calls numbered <= 256, the fast pool is XORed into the primary CRNG state.
> At call 256, the CRNG is deemed initialized, getrandom(2) is unblocked,
> and reading from /dev/urandom no longer gives warnings.
> At calls > 256, the fast pool is fed into the input pool, leaving the CRNG
> untouched.
> The problem arises between call number 256 and 320. If crng_initialize()
> is called at this time, it will overwrite the _entire_ CRNG state with
> 48 bytes generated from the input pool.

So in practice this isn't a problem because crng_initialize is called
in early init. For reference, the ordering of init calls are:

"early", <--- crng_initialize is here()
"core", <---- ftrace is initialized here()
"subsys", <---- acpi_init is here()
"device", <---- device probing is here

So in practice, call 256 typically happens **well** after
crng_initialize. You can see where it is the boot messages, which is
after 2.5 seconds into the boot:

[ 2.570733] rtc_cmos 00:02: alarms up to one month, y3k, 114 bytes nvram, hpet irqs
[ 2.570863] usbcore: registered new interface driver i2c-tiny-usb
[ 2.571035] device-mapper: uevent: version 1.0.3
[ 2.571215] random: fast init done <-------------
[ 2.571316] device-mapper: ioctl: 4.35.0-ioctl (2016-06-23) initialised: dm-devel@xxxxxxxxxx
[ 2.571678] device-mapper: multipath round-robin: version 1.1.0 loaded
[ 2.571728] intel_pstate: Intel P-state driver initializing
[ 2.572331] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input3
[ 2.572462] intel_pstate: HWP enabled
[ 2.572464] sdhci: Secure Digital Host Controller Interface driver

When is crng_initialize() called? Sometime *before* 0.05 seconds into
the boot on my laptop:

[ 0.054529] ftrace: allocating 29140 entries in 114 pages

> In short, the situation is:
> A) No usable hardware RNG or arch_get_random() (or we don't trust it...)
> B) add_interrupt_randomness() called 256-320 times but other
> add_*_randomness() functions aren't adding much entropy.
> C) then crng_initialize() is called
> D) not enough calls to add_*_randomness() to push the entropy
> estimate over 128 (yet)
> E) getrandom(2) or /dev/urandom used for something important
> Based on a few experiments with VMs, A) through D) can occur easily in
> practice. And with no HDD we have a window of about a minute or two for
> E) to happen before add_interrupt_randomness() finally pushes the
> estimate over 128 on its own.

How did you determine when crng_initialize() was being called? On a
VM generally there are fewer interrupts than on real hardware. On
KVM, for I see the random: fast_init message being printed 3.6 seconds
into the boot.

On Google Compute Engine, the fast_init message happens 52 seconds into the

So what VM where you using? I'm trying to figure out whether this is
hypothetical or real problem, and on what systems.

- Ted