Re: [PATCH 1/3] Make /dev/urandom scalable

From: Theodore Ts'o
Date: Wed Sep 23 2015 - 17:10:22 EST


On Tue, Sep 22, 2015 at 04:16:05PM -0700, Andi Kleen wrote:
>
> This patch changes the random driver to use distributed per NUMA node
> nonblocking pools. The basic structure is not changed: entropy is
> first fed into the input pool and later from there distributed
> round-robin into the blocking and non blocking pools. This patch extends
> this to use an dedicated non blocking pool for each node, and distribute
> evenly from the input pool into these distributed pools, in
> addition to the blocking pool.
>
> Then every urandom/getrandom user fetches data from its node local
> pool. At boot time when users may be still waiting for the non
> blocking pool initialization we use the node 0 non blocking pool,
> to avoid the need for different wake up queues.

What I would suggest is that we only create the per-NUMA node pools
until the original node 0 is marked as initialized (i.e., that it was
been initialized with 128 bits of randomness). At that point
initialize pools 1..n using the originial non-blocking pool by using
get_random_bytes() to fill up the pool. Once all of the pools are
initialized, only then set the nonblocking_node_pool variable. In
practice, especially for the large server systems, getting 128 bits of
randomness to initialize the primary non-blocking pool shouldn't take
long. On my laptop, it takes 4 seconds.

My concern, though, is that things like initialized ssh host keys
happen at boot time, so anything that weakens the random number
generator is a bad thing. (In practice, this isn't a problem
pre-systemd, but systemd speeds up the boot sequence quickly enough
that this is in fact a potential security problem.) We already have
the problem that systemd-udevd grabs random numbers before the random
pool is initialized:

[ 1.124926] random: systemd-udevd urandom read with 13 bits of entropy available
[ 4.137543] random: nonblocking pool is initialized

> The different per-node pools also start with different start
> states and diverge more and more over time, as they get
> feed different input data. So "replay" attacks are
> difficult after some time.

The problem is "after some time" happens after public keys get
generated, this isn't good for anything other than starving academics
publishing papers about weaknesses in the Linux's random number
generator. :-)

So I'd strongly prefer if we don't weaken the random number generator
boot-time initialization story, and the best way to do this is to wait
for a single non-blocking pool to be completely initialized, and then
use that pool to initialize the rest of the pools. If that means that
we don't have the full /dev/urandom scalability during the first few
seconds after the system is booted, to my mind that's a fair tradeoff.

Does that sound reasonable?

> For saving/restoring /dev/urandom, there is currently no mechanism
> to access the non local node pool (short of setting task affinity).
> This implies that currently the standard init/exit random save/restore
> scripts would only save node 0. On restore all pools are updates.
> So the entropy of non 0 gets lost over reboot. That seems acceptable
> to me for now (fixing this would need a new separate save/restore interface)

Yes, that's fine; I'm not really worried about that, since
getrandom(2) only provides a guarantee of initialized cryptographic
randomness, and that's what we should most care about.

- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/