Re: /dev/random on Linux

From: Theodore Tso
Date: Tue May 16 2006 - 16:17:34 EST


On Tue, May 16, 2006 at 04:54:00PM +0300, Zvi Gutterman wrote:
>
> I did not get any answer from Matt and was sure that it was of no interest.
> This was my mistake, sorry for not sending it earlier to more people.

Note that Matt toke over maintainance from me, but I was the original
designer of most of the /dev/random algorithms (with much help from
Colin Plumb). I haven't had time to do a full analysis of the paper,
but let me give a couple comments to set the context.

First of all, the Linux /dev/random generator has been around for a
long time; I did the original implementation back in 1994, back when
the crypto iron curtain was still being maintained by the US
Government. (As far as I know, the Linux /dev/random driver was the
first OS-based RNG and predates efforts present in other systems such
as OpenBSD, et. al. Not being an academic trying to get tenure, I
never bothered to write a paper back in '94, so I guess that means I
don't get any credit, which is probably fair if you're playing the
academic game. Publish or perish, after all. :-) In any case,
because of the U.S. export controls on cryptographic algorithms at
that time, in particular anything relating to encryption algorithms, I
chose to use cryptogaphic hashes as its primary non-linear component,
since these were not problematic from an export control perspective.

In addition, the design philosophy is somewhat different from what has
been considered the general wisdom at least in academic circles.
First of all, as noted above, export controls were a major issue. In
addition, in that time period, flaws in MD4 and MD5 were coming to
light, and it was clear that some cryptographic algorithms provided by
the US government (in particular, DSA) were deliberately designed to
prevent their usefulness in anything other than their released context
(i.e., There are many who believe that one of the DSA primary design
criteria was that it could only be used for digital signatures, and
*not* provide encryption functionality). So one of the key design
criteria was that /dev/random use its cryptographic primitives in ways
that were non-brittle --- that is, even if the primary use of that
cryptographic primitive were compromised (e.g., the recent collision
attacks against SHA-1), the security of the system would not be
compromised.

One of the reasons why I did not choose a construction such as Bruce
Schneier's Yarrow, which encrypts an incrementing counter as its
output stage, is that repeated encryptions of plaintexts that have
differ from each other by a small hamming distance is one of the
things that cryptographers look for when attacking a cipher. Maybe
modern encryption systems are so good that we don't have to worry
about such things any more (and certainly most academic papers ignore
such engineering considerations), but I like to take a much more
humble attitude towards making such assumptions, since I *know* the
cryptographers at the NSA know more than I do, and underestimating
them or any other state-sponsored cryptoanalytic organization is
generally not a good idea. In any case, while if AES is perfect, yes,
using a key to encrypt a counter might be probably good enough --- but
I'm not sure I want to make that assumption.

Secondly, attacks that assume that the attacker can obtain the
contents of the pool were not a major consideration, mainly because
practically speaking, if an attacker can obtain the contents of the
internal state of the RNG, the attacker generally has enough
privileges that the entire system has been compromised, meaning that
the system can be rooted to a fare-thee-well --- all future session
keys and data to be encrypted can then be collected, etc. So attacks
of the form, "assume that the attack gains access to the pool, but has
no other illicit access to the system" are particularly practical in
the real world, and are mostly of concern to academics.

This is particularly tree of the criteria put forward by the designers
of the Yarrow RNG, where they were concerned about how quickly the RNG
could recover after the pool state got compromised. My argument was
always that if an attacker had enough privileges to compromise the
internal state of the RNG, in the real world, recovery would require
re-installing from trusted media.

The desire for forward security is different, admittedly; here the
concern is that if the system has already been compromised, you don't
want to make it easy the attacker to find previously generated session
keys or even long-term keys. However, it should be noted that it
takes O(2**64) operations to go back a single turn of the crank--- but
that only helps you get the last 10 bytes (80 bits) that were last
generated. In order to get the next 80 bits, you have to do another
O(2**64) amount of work, and sooner or later (within 18 turns of the
crank, best case) you will run into one of the "unlucky" indexes where
it will take you at least O(2**96) of work effort. To put this in
perspective, generating a 1024 bit RSA key will require approximately
14 turns of the crank, so if you are lucky with the positioning of the
index *and* you penetrate the machine and capture the state of the
pool (which as I mentioned, probably means you've rooted the box and
the system will probably have to be reinstalled from trusted media
anyway), *and* a 1024-bit RSA key had just been generated, you would
be able to determine that 1024-bit RSA key with a work factor of
approximately O(2**68) if you are lucky (probability 1 in 8), and
O(2**96) if you are not. Should we try do better, sure, but this is
not the sort of thing where you have to keep a secret and only talk to
the official maintainer about it.

I would also note that one of the things Matt did after he took over
maintainership from me is that he took out a call to
add_timer_randomness() from the extract entropy code. This bit of
code would have also significantly complicated practical attacks on
backtracking attacks, since it mixes in the TSC cycle counter at the
time of the entropy extraction into the pool. So even if you capture
the state of the pool, unless you know exactly *when* each of the
various entropy extractions took place, you wouldn't be able to easily
calculate the various candidate pools.

More critically, the paper is seriously flawed in its analysis of
reversing the primary pool. The only reason why the generic attack
takes only 2**96 when the secondary and urandom pools are only 32
words long, is that the entropy extraction routine is only calls
add_entropy_words ceil(n/32)+1 times. However, the primary entropy
pool is 128 words long, which means that add_entropy_words() is called
9 times. The paper recognizes this in Appendix B, which shows that j,
j-1, j-2, ... j-8 is modiified when extracting entropy from the
primary pool. This makes the generic attack take work factor
O(2**288), and the optimized attack also has to be seriously reworked
before it can be carried out on the primary pool. This also points
out that the simple expedient of doubling the size of the secondary
and urandom pool would effectively eliminate the practicality of
carrying out an forward attack.

In section 3.4, a comment is made about guessable passwords to seed
the entropy pool, and a recommendation was made to remove the value of
the keyboard from the input to the pool. I would reject that
recommendation, since the entropy estimated is calulated based only on
the timing of the key, and the inter-key timing values is mixed into
*in* *addition* *to* the value of the keys typed. If the user types a
known or guessable password, the yes, the value of what is typed is no
help in adding entropy to the pool --- but it doesn't hurt, either. I
have no argument with the observation that main source of entropy
added is the inter-key arrival time, and even there we are extremely
conservative on estimating the amount of entropy found since obviously
the "fist" of a typist could be analyzed and used to provide some
information about inter-key arrival times, particularly if a
high-resolution time source is not available.

Speaking generally about usage scenarios such as the OpenWRT Linux
distribution, if there is no hard disk and relatively little user
input, the amount of entropy that can be gathered will suffer. That
however isn't the fault of the /dev/random driver, but how it is put
to use. One of the things which I would strongly recommend to
security applications, such as those found in the OpenWRT Linux
distribution, is to calculate the cryptographic hash of data which is
to be sent out encrypted, and write that into /dev/random, thus mixing
that hash into the entropy pool. This will not provide any entropy
credits (since at least one entity would have knowledge of the value
of the hash). However, to attackers that do *not* have access to the
plaintext of the encrypted communications sent out by the application,
this unknown data will significantly complicate the life of attackers
attempting to analyze the state of the random number generator. In
the absence of hard drive-generated entropy or human-generated timing
data from a keyboard or pointing device, it's certainly better than
nothing, and in practice is quite effective.

All of this being said, I think the paper does make a number of good
points, and I would recommend making the following changes to Linux's
/dev/random:

1) Restore the add_timer_randomess() call to the entropy extraction
function. This prevents the forward security attack unless the
attacker can know the exact time when each of the entropy extractions
took place, which in general is going to be quite impractical.

2) Double the size of the secondary and urandom pools from 128 to 256
bytes.

3) Investigate the possibility of adding quotas to /dev/random. This
is actually much more trickier that the paper suggests, since you want
to allow the user to be able to extract enough entropy to create a
2048 bit (or at least a 1024-bit) RSA key. The problem is that's a
lot of entropy! Maybe it would be OK to only allow a 1024-bit RSA key
to be generated every 12 or 24 hours, but suppose someone is
experimenting with GPG and screws up (say they forget their
passphrase); do you tell them that sorry, you can't generating another
key until tomorrow? So now we have to have an interface so the root
user can reset the user's entropy quota.... And even with a 24-hour
limit, on a diskless system, you don't get a lot of entropy, so even a
1024-bit RSA key could seriously deplete your supply of entropy.

This last point is a good example of the concerns one faces when
trying to design a working system in the real word, as opposed to the
concerns of academicians, where the presence or lack of forward
security in the event of a pool compromise is issue of massive urgency
and oh-my-goodness-we-can-only-tell-the-maintainer-because-it's-such-a-
critical-security-hole attitude. Where as my attitude is, "yeah, we
should fix it, but I doubt anyone has actually been harmed by this in
real life", which puts it in a different category than a buffer
overrun attack which is accessible from a publically available network
service.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/