[RFC PATCH 41/41] random: lower per-IRQ entropy estimate upon health test failure

From: Nicolai Stange
Date: Mon Sep 21 2020 - 04:01:19 EST


Currently, if fips_enabled is set, a per-IRQ min-entropy estimate of
either 1 bit or 1/8 bit is assumed, depending on whether a high resolution
get_cycles() is available or not. The statistical NIST SP800-90B startup
health tests are run on a certain amount of noise samples and are intended
to reject in case this hypothesis turns out to be wrong, i.e. if the
actual min-entropy is smaller. As long as the startup tests haven't
finished, entropy dispatch and thus, the initial crng seeding, is
inhibited. On test failure, the startup tests would restart themselves
from the beginning.

It follows that in case a system's actual per-IRQ min-entropy is smaller
than the more or less arbitrarily assessed 1 bit or 1/8 bit resp., there
will be a good chance that the initial crng seed will never complete.
AFAICT, such a situation could potentially prevent certain userspace
daemons like OpenSSH from loading.

In order to still be able to make any progress, make
add_interrupt_randomness() lower the per-IRQ min-entropy by one half upon
each health test failure, but only until the minimum supported value of
1/64 bits has been reached. Note that health test failures will cause a
restart of the startup health tests already and thus, a certain number of
additional noise samples resp. IRQ events will have to get examined by the
health tests before the initial crng seeding can take place. This number
of fresh events required is reciprocal to the estimated per-IRQ
min-entropy H: for the Adaptive Proportion Test (APT) it equals ~128 / H.
It follows that this patch won't be of much help for embedded systems or
VMs with poor IRQ rates at boot time, at least not without manual
intervention. But there aren't many options left when fips_enabled is set.

With respect to NIST SP800-90B conformance, this patch enters kind of a
gray area: NIST SP800-90B has no notion of such a dynamically adjusted
min-entropy estimate. Instead, it is assumed that some fixed value has been
estimated based on general principles and subsequently validated in the
course of the certification process. However, I would argue that if a
system had successfully passed certification for 1 bit or 1/8 bit resp. of
estimated min-entropy per sample, it would automatically be approved for
all smaller values as well. Had we started out with such a lower value
passing the health tests from the beginning, the latter would never have
complained in the first place and the system would have come up just fine.

Finally, note that all statistical tests have a non-zero probability of
false positives and so do the NIST SP800-90B health tests. In order to not
keep the estimated per-IRQ entropy at a smaller level than necessary for
forever after spurious health test failures, make
add_interrupt_randomness() attempt to double it again after a certain
number of successful health test passes at the degraded entropy level have
been completed. This threshold should not be too small in order to avoid
excessive entropy accounting loss due to continuously alternating between
a too large per-IRQ entropy estimate and the next smaller value. For now,
choose a value of five as a compromise between quick recovery and limiting
said accounting loss.

So, introduce a new member ->good_tests to struct fast_pool for keeping
track of the number of successfult health test passes. Make
add_interrupt_randomness() increment it upon successful healh test
completion and reset it to zero on failures. Make
add_interrupt_randomness() double the current min-entropy estimate and
restart the startup health in case ->good_tests is > 4 and the entropy
had previously been lowered.

Signed-off-by: Nicolai Stange <nstange@xxxxxxx>
---
drivers/char/random.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index bb79dcb96882..24c09ba9d7d0 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1126,6 +1126,7 @@ struct fast_pool {
bool dispatch_needed : 1;
bool discard_needed : 1;
int event_entropy_shift;
+ unsigned int good_tests;
struct queued_entropy q;
struct health_test health;
};
@@ -1926,9 +1927,13 @@ void add_interrupt_randomness(int irq, int irq_flags)
cycles);
if (unlikely(health_result == health_discard)) {
/*
- * Oops, something's odd. Restart the startup
- * tests.
+ * Oops, something's odd. Lower the entropy
+ * estimate and restart the startup tests.
*/
+ fast_pool->event_entropy_shift =
+ min_t(unsigned int,
+ fast_pool->event_entropy_shift + 1, 6);
+ fast_pool->good_tests = 0;
health_test_reset(&fast_pool->health,
fast_pool->event_entropy_shift);
}
@@ -1951,6 +1956,7 @@ void add_interrupt_randomness(int irq, int irq_flags)
* entropy discard request?
*/
fast_pool->dispatch_needed = !fast_pool->discard_needed;
+ fast_pool->good_tests++;
break;

case health_discard:
@@ -2005,6 +2011,21 @@ void add_interrupt_randomness(int irq, int irq_flags)
if (fast_pool->dispatch_needed || health_result == health_none) {
reseed = __dispatch_queued_entropy_fast(r, q);
fast_pool->dispatch_needed = false;
+
+ /*
+ * In case the estimated per-IRQ min-entropy had to be
+ * lowered due to health test failure, but the lower
+ * value has proven to withstand the tests for some
+ * time now, try to give the next better value another
+ * shot.
+ */
+ if (unlikely((fast_pool->event_entropy_shift >
+ min_irq_event_entropy_shift())) &&
+ fast_pool->good_tests > 4) {
+ fast_pool->event_entropy_shift--;
+ health_test_reset(&fast_pool->health,
+ fast_pool->event_entropy_shift);
+ }
} else if (fast_pool->discard_needed) {
int dummy;

--
2.26.2