On Thu, Sep 24, 2015 at 12:00:44PM -0400, Austin S Hemmelgarn wrote:I do not remember what exact dieharder version or command-line arguments (this was almost a decade ago), except that I compiled it from source myself, I do remember it was a 32-bit x86 processor (as that was sadly all I had to run Linux on at the time), and an early 2.6 series kernel (which if I remember correctly was already EOL by the time I was using it). It may haven been impacted by the fact that I did the testing in QEMU, but I would not expect that to affect things that much. It is worth noting that I only saw this happen three times, and and each time it was in a sample of 2000 runs (which has always been the sample size I've used, as that's the point at which I tend to get impatient).
I've had cases where I've done thousands of dieharder runs, and it
failed almost 10% of the time, while stuff like mt19937 fails in
otherwise identical tests only about 1-2% of the time
That is a startling result. Please say what architecture, kernel
version, dieharder version and commandline arguments you are using to
get 10% WEAK or FAILED assessments from dieharder on /dev/urandom.
The diehard_sums test is known and documented to be a flawed test. As far as the other failures, even a top quality RNG should get them sometimes (because a good RNG _should_ spit out long runs of identical bits from time to time, which is why the absolute insanity that is FIPS cryptography standards should not ever be considered when doing anything other than security work (and only considered cautiously even there)). Based on what I've seen with the AES_OFB generator, 'perfect' generators should be getting WEAK results about 1% of the time, and FAILED results about 0.1% of the time (except on diehard_sums).
Since the structure of linux urandom involves taking a cryptographic
hash the basic expectation is that it would fail statistical randomness
tests at similar rates to e.g., dieharder's AES_OFB (-g 205) even in the
absence of any entropy in the kernel pools.
So if 10% failures at correct statistical tests can be replicated it is
important and needs attention.
I did take a few moments to look into this today and got starling
failures (p-value 0.00000000) with e.g.,
dieharder -g 501 -d 10
(and a few other tests) using dieharder 3.31.1 on both debian
linux-4.1-rt-amd64 and debian kfreebsd-10-amd64, but this seems to be an
upstream bug known at least to debian and redhat, possibly fixed in
current Fedora but apparently not in Debian.
if you have an affected version, these failures are seen only with -g
501, not with -g 200 < /dev/urandom. They are probably also not seen
with 32-bit dieharder.
diehard_parking_lot| 0| 12000| 100|0.00000000| FAILED
diehard_2dsphere| 2| 8000| 100|0.00000000| FAILED
diehard_3dsphere| 3| 4000| 100|0.00000000| FAILED
diehard_squeeze| 0| 100000| 100|0.00000000| FAILED
diehard_sums| 0| 100| 100|0.00000000| FAILED
Description: S/MIME Cryptographic Signature