Re: [PATCH 07/10] random: add new get_random_bytes_arch() function

From: Ingo Molnar
Date: Wed Jul 25 2012 - 03:22:44 EST



* H. Peter Anvin <hpa@xxxxxxxxx> wrote:

> For those who have read the Google+ thread[1] it is pretty
> clear that there are varying opinions on the idea of removing
> the RDRAND bypass.
>
> I have gathered some performance numbers to make the debate
> more concrete: RDRAND is between 12 and 15 times faster than
> the current random pool system (for large and small blocks,
> respectively.) Both the pool system and RDRAND scale
> perfectly with frequency, so the ratio is independent of
> P-states.
>
> Given the discrepancy in performance (and presumably, in terms
> of power) I still very much believe it is a mistake to
> unconditionally disallow users the option for using RDRAND
> directly, but what I *do* believe we can all agree on is that
> security is paramount. Dropping RDRAND is not just a
> performance loss but is likely a security loss since it will
> produce substantially less entropy.
>
> As a compromise I offer the following patch; in terms of
> performance it is "the worst of both worlds" but it should
> provide the combined security of either; even if RDRAND is
> completely compromised by the NSA, Microsoft and the
> Illuminati all at once it will do no worse than the existing
> code, [...]

I should mention that since there's documented historic examples
of our software random pool in drivers/char/random.c having bugs
that no-one noticed, XOR-ing the RDRAND hardware stream to the
software entropy pool's stream is the sensible thing to do, from
a security point of view.

So your patch recovers the security lost via this recent patch
in random.git:

c2557a303ab6 random: add new get_random_bytes_arch() function

I don't think random.git should be sent upstream without
addressing this Ivy Bridge security regression first.

Btw., the commit above has a very misleading title: it not just
'adds' a function but the far more important change it does is
that it removes RDRAND from the output stream.

The commit also made random number generation an order of
magnitude slower.

> [...] and (since RDRAND is so much faster than the existing
> code) it has only a modest performance cost. More
> realistically, it will let many more users take advantage of a
> high entropy quick-reseeding random number generator, thus
> ending up with a major gain in security.
>
> It is worth noting that although RDRAND by itself is adequate
> for any in-kernel users (and the 3.4-3.5 kernels use them as
> such unless you specify "nordrand"), this is not true for
> /dev/random; nor, due to abuse, /dev/urandom; the recently
> disclosed[2] RDSEED instruction, on the other hand, is defined
> to be fully entropic and can be used for any purpose; that one
> will be introduced in a later processor.
>
> Note that the attached patch is way more conservative than it needs
> to be: every byte is mixed with RDRAND data twice on its way through
> (and an additional 1.2 byte is lost), as I moved the mixing to
> extract_buf(), but even so the overhead is modest, and mixing in
> extract_buf() makes the code quite a bit simpler.
>
> This patch is on top of random.git.
>
>
> [1] https://plus.google.com/115124063126128475540/posts/KbAEJKMsAfq
>
> [2] http://software.intel.com/file/45207
>
> --
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel. I don't speak on their behalf.
>

> From b36c22b00c6bf8e91a758d3167e912b0ac4f0d0c Mon Sep 17 00:00:00 2001
> From: "H. Peter Anvin" <hpa@xxxxxxxxxxxxxxx>
> Date: Tue, 24 Jul 2012 14:48:56 -0700
> Subject: [PATCH] random: mix in architectural randomness in extract_buf()
>
> RDRAND is so much faster than the Linux pool system that we can
> always just mix in architectural randomness.
>
> Doing this in extract_buf() lets us do this in one convenient
> place, unfortunately the output size (10 bytes) is maximally
> awkward. That, plus the fact that every output byte will have
> passed through extract_buf() twice means we are not being very
> efficient with the RDRAND use.
>
> Measurements show that RDRAND is 12-15 times faster than the Linux
> pool system. Doing the math shows this corresponds to about an
> 11.5% slowdown which is confirmed by measurements.
>
> Users who are very performance- or power-sensitive could definitely
> still benefit from being allowed to use RDRAND directly, but I
> believe this version should satisfy even the most hyper-paranoid
> crowd.
>
> Signed-off-by: H. Peter Anvin <hpa@xxxxxxxxxxxxxxx>
> Cc: DJ Johnson <dj.johnson@xxxxxxxxx>
> ---
> drivers/char/random.c | 56 ++++++++++++++++++++++++++++---------------------
> 1 file changed, 32 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 9793b40..a4a24e4 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -277,6 +277,8 @@
> #define SEC_XFER_SIZE 512
> #define EXTRACT_SIZE 10
>
> +#define LONGS(x) (((x) + sizeof(unsigned long) - 1)/sizeof(unsigned long))
> +
> /*
> * The minimum number of bits of entropy before we wake up a read on
> * /dev/random. Should be enough to do a significant reseed.
> @@ -813,11 +815,7 @@ static ssize_t extract_entropy(struct entropy_store *r, void *buf,
> */
> static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
> {
> - union {
> - __u32 tmp[OUTPUT_POOL_WORDS];
> - long hwrand[4];
> - } u;
> - int i;
> + __u32 tmp[OUTPUT_POOL_WORDS];
>
> if (r->pull && r->entropy_count < nbytes * 8 &&
> r->entropy_count < r->poolinfo->POOLBITS) {
> @@ -828,23 +826,17 @@ static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
> /* pull at least as many as BYTES as wakeup BITS */
> bytes = max_t(int, bytes, random_read_wakeup_thresh / 8);
> /* but never more than the buffer size */
> - bytes = min_t(int, bytes, sizeof(u.tmp));
> + bytes = min_t(int, bytes, sizeof(tmp));
>
> DEBUG_ENT("going to reseed %s with %d bits "
> "(%d of %d requested)\n",
> r->name, bytes * 8, nbytes * 8, r->entropy_count);
>
> - bytes = extract_entropy(r->pull, u.tmp, bytes,
> + bytes = extract_entropy(r->pull, tmp, bytes,
> random_read_wakeup_thresh / 8, rsvd);
> - mix_pool_bytes(r, u.tmp, bytes, NULL);
> + mix_pool_bytes(r, tmp, bytes, NULL);
> credit_entropy_bits(r, bytes*8);
> }
> - kmemcheck_mark_initialized(&u.hwrand, sizeof(u.hwrand));
> - for (i = 0; i < 4; i++)
> - if (arch_get_random_long(&u.hwrand[i]))
> - break;
> - if (i)
> - mix_pool_bytes(r, &u.hwrand, sizeof(u.hwrand), 0);
> }
>
> /*
> @@ -901,15 +893,19 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
> static void extract_buf(struct entropy_store *r, __u8 *out)
> {
> int i;
> - __u32 hash[5], workspace[SHA_WORKSPACE_WORDS];
> + union {
> + __u32 w[5];
> + unsigned long l[LONGS(EXTRACT_SIZE)];
> + } hash;
> + __u32 workspace[SHA_WORKSPACE_WORDS];
> __u8 extract[64];
> unsigned long flags;
>
> /* Generate a hash across the pool, 16 words (512 bits) at a time */
> - sha_init(hash);
> + sha_init(hash.w);
> spin_lock_irqsave(&r->lock, flags);
> for (i = 0; i < r->poolinfo->poolwords; i += 16)
> - sha_transform(hash, (__u8 *)(r->pool + i), workspace);
> + sha_transform(hash.w, (__u8 *)(r->pool + i), workspace);
>
> /*
> * We mix the hash back into the pool to prevent backtracking
> @@ -920,14 +916,14 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
> * brute-forcing the feedback as hard as brute-forcing the
> * hash.
> */
> - __mix_pool_bytes(r, hash, sizeof(hash), extract);
> + __mix_pool_bytes(r, hash.w, sizeof(hash.w), extract);
> spin_unlock_irqrestore(&r->lock, flags);
>
> /*
> * To avoid duplicates, we atomically extract a portion of the
> * pool while mixing, and hash one final time.
> */
> - sha_transform(hash, extract, workspace);
> + sha_transform(hash.w, extract, workspace);
> memset(extract, 0, sizeof(extract));
> memset(workspace, 0, sizeof(workspace));
>
> @@ -936,11 +932,23 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
> * pattern, we fold it in half. Thus, we always feed back
> * twice as much data as we output.
> */
> - hash[0] ^= hash[3];
> - hash[1] ^= hash[4];
> - hash[2] ^= rol32(hash[2], 16);
> - memcpy(out, hash, EXTRACT_SIZE);
> - memset(hash, 0, sizeof(hash));
> + hash.w[0] ^= hash.w[3];
> + hash.w[1] ^= hash.w[4];
> + hash.w[2] ^= rol32(hash.w[2], 16);
> +
> + /*
> + * If we have a architectural hardware random number
> + * generator, mix that in, too.
> + */
> + for (i = 0; i < LONGS(EXTRACT_SIZE); i++) {
> + unsigned long v;
> + if (!arch_get_random_long(&v))
> + break;
> + hash.l[i] ^= v;
> + }
> +
> + memcpy(out, hash.w, EXTRACT_SIZE);
> + memset(&hash, 0, sizeof(hash));
> }

Acked-by: Ingo Molnar <mingo@xxxxxxxxxx>

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/