Re: Updated scalable urandom patchkit

From: George Spelvin
Date: Wed Oct 21 2015 - 04:27:53 EST


After having worked out the basics of a non-blocking /dev/urandom read,
(patches poated; I'm hoping someone will comment on them!), I need to
work out the crypto.

Patch 3/4 of that series showed how to have concurrent pool readers not
separated by mixback operations, but it was a stub, not really well
thought through. Since then, I've been trying to think of how to do
it properly. That entails figuring out exactly what the design goals
of the mixback are.

The current mixback operation works as follows:
* Generate 160 bits of hash output
* Produce 80 bits of that as user output
* Mix all 160 bits back into the pool. This achieves two things:
* Salting. It changes the pool state so the next hash will
produce different output.
* Anti-backtracking protection. By adding the output back to the input,
it makes it difficult for an attacker who captures the later pool
state to figure out the

Now, the salting goal can be achieved just as well with a simple
unique nonce. In the example, it's an incrementing counter and CPU ID.

But the anti-backtracking is more complex.

First of all, we note that the current design is only computationally
secure, not information theoretically. To achieve the latter, we'd
have to actually destroy entropy in the pool (by zeroing it or doing
something like Spritz's crush() operation), which doesn't seem worth it.

The basic idea of having as much mixback as output seems good.
If you're going to do it at all, you might as well do it right,
and you don't want a 256-bit key to have an 160-bit attack from
a captured pool state.

But there is a practical maximum. I can't think of a reason why you'd
need more than the output pool size (128 bytes = 1024 bits), but maybe
256 bits is good enough. Thoughts?

(256 bits also corresponds to one full cycle of the input mix
over the pool, FWIW.)


One thing that's questionable is the fact that the entire 20 bytes is mixed
back. Because the input hash is not very rapid, it's possible for the
mixback to land on a low-entropy part of the pool and not be well hidden.

If that happens, the previous random output (or partial information about
it) would be recoverable, even if the full previous pool state is not.

I'm not sure how big a problem this is, but only mixing back 80
bits (independent of the output bits) might be better.


A change I'm thinking about, which would simplify implementation
considerably, would be to use all 160 bits of hash output for the user
output, and then do additional hashing, to do the mixback.

This greatly simplifies the contended-pool case by only requiring
readers to pass the *amount* of mixback to the lock holder, which
can be done with an atomic_add. There's no need to pass actual data,
because the lock-holder can just generate it themselves.

Specifically, in the low-contention case, the reader would use any
leftover hash bytes for mixback if it could get the lock, but it wouldn't
be a big performance hit to throw them away if someone else was holding
the lock.

In the case of extremely high contention, the limit on maximum total
mixback would let the lock holder do less mixback than the total
that would be done by the concurrent readers.


But that requires understanding the reasons for only outputting
half of the hash result, to know if it's safe to make the change.
Three possibilities come to mind:

1. To produce enough mixback. This is preserved by the proposed change.

2. To avoid entropy loss due to collisions in the non-invertible final
addition in a "narrow-pipe" hash like SHA.

This is a generic issue with hashes which do not maintain an internal
state wider than the input. This includes MD5, SHA1, SHA2, and
BLAKE/BLAKE2. (But not Keccak, SipHash, JH, Gr{\o}stl, Skein, etc.)

Assume the first half of the pool has lots of entropy, so the 160-bit
output of the first sha_transform() call is perfectly uniformly
distributed. (Each of the 2^160 possible output occurs with probably
2^-160, so 160 bits of entropy by any easure.)

If the second half of the pool has no entropy (a known, fixed value),
then sha_transform approximates a 160-to-160 bit random function.
Such a function has output collisions with probabilities described
by the Poisson distribution with mean (lambda) 1. (This number is
the ratio of input and output state sizes.)

In particular, 1/e = e^-1 = 36.788% of the outputs do not appear for
any input. (If there are k time more inputs than output, all equally
likely, it's e^-k. So just a few bits reduces that drastically.)

If you do the Shannon entropy math, that means that the output
has log2(e) = 0.827245389... fewer bits of entropy than the input.
(If there are k states in the second half of the pool, the entropy loss
is slightly less than 1/k of this value. So 163 bits in produces 160 -
0.827/8 = 159.9 bits out.)

If you're going by min-entropy, it's worse. Over 2^160 inputs,
the Poisson distribution predicts 3.5 41-way collisions, and 0.0866
42-way collisions. So the loss is log2(41) = 5.358 bits. (If there
are 8 states in the second half, then we expect a 77-way collision, for
a min-entropy of 160 - log2(77/8) = 160 - 3.2668 = 156.73 bits.)

(For more on this, the work of Andrea R\"ock, e.g. "Collision Attacks
based on the Entropy Loss caused by Random Functions", is good reading.)

Although this affects all narrow-pipe hashes, you only need to chop
a few bits off the state size to be nearly perfect in Shannon terms,
and 8 bits gets you less than 1 bit of min-entropy lost. 2 bytes
makes the largest collision that you expect to find at least 1 of among
2^176 inputs is a 69270-way, resulting in a loss of log2(69270/65536)
= 0.08 bit of min-entropy.

So while I can see the point to truncating the hash, cutting it in half
seems unnecessary.

3. General "I don't trust SHA-1" principles without a more specific
reason. If this is all there is, would substituting a different hash
(I'm looking into BLAKE2 at Ted's suggestion) help?

Note that the entire SHAppening issue is not particularly relevant
to /dev/urandom attacks. Attacking the /dev/urandom output mix is
more like a pre-image attack, but considerably harder, since you're
not trying to find not just *a* pre image, but *the* pre-image.

Using the expected pre-image resistance of the hash is insanely
conservaitve. Perhaps not a bad thing, though. But could we use
128 bits rather than 80?


Finally, I'm trying to figure out how to use the get_radom_int_hash
array for salt material. We already have it, so it would be nice
to use. But the current way it's used (basically, uses the random_int_secret
as a key and runs in OFBmode) leaves the last random number visible.

A pool reader must change its salt if it can't get the mixback lock.
There may be a few bytes of leftover entropy available for the purpose
if the hash size doesn't divide the read size.

I could always do another hash iteration to reseed get_random_int_hash if
I can't get the mixback lock, but is there a cheaper way?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/