Re: [PATCH 2/3] random: rng-seed source is utf-8

From: Theodore Y. Ts'o
Date: Fri Feb 14 2020 - 19:57:43 EST


On Fri, Feb 14, 2020 at 02:55:36PM -0800, Mark Salyzyn wrote:
> > This is why I really think what gets specified via the boot command
> > line, or bootconfig, should specify the bits of entropy and the
> > entropy seed *separately*, so it can be specified explicitly, instead
> > of assuming that *everyone knows* that rng-seed is either (a) a binary
> > string, or (b) utf-8, or (c) a hex string. The fact is, everyone does
> > *not* know, or everyone will have a different implementation, which
> > everyone will say is *obviously* the only way to go....
> >
> Given that the valid option are between 4 (hex), 6 (utf-8) or 8 (binary), we
> can either split the difference and accept 6; or take a pass at the values
> and determine which of the set they belong to [0-9a-fA-F], [!-~] or
> [\000-\377]  nor need to separately specify.

So let's split this up into separate issues. First of all, from an
architectural issue, I really think we need to change
add_bootloader_randomness() in drivers/char/random.c so it looks like this:

void add_bootloader_randomness(const void *buf, unsigned int size, unsigned int entropy_bits)

That's because this is a general function that could be used by any
number of bootloaders. For example, for the UEFI bootloader, it can
use the UEFI call that will return binary bits. Some other bootloader
might use utf-8, etc. So it would be an abstraction violation to have
code in drivers/char/random.c make assumption about how a particular
bootloader might be behaving.

The second question is we are going to be parsing an rng_seed
parameter it shows up in bootconfig or in the boot command line, how
do we decide how many bits of entropy it actually has. The advantage
of using the boot command line is we don't need to change the rest of
the bootloader ecosystem. But that's also a massive weakness, since
apparently some people are already using it, and perhaps not in the
same way.

So what I'd really prefer is if we use something new, and we define it
in a way that makes as close as possible to "impossible to misuse".
(See Rusty Russell's API design manifesto[1]). So I'm going to
propose something different. Let's use something new, say
entropy_seed_hex, and let's say that it *must* be a hex string:

entropy_seed_hex=7337db91a4824e3480ba6d2dfaa60bec

If it is not a valid hex string, it gets zero entropy credit.

I don't think we really need to worry about efficient encoding of the
seed, since 256 bits is only 64 characters using a hex string. And
whether it's 32 characters or 64 characters, the max command line
string is 32k, so it's probably not worth it to try to do something
more complex. (And only 128 bits is needed to declare the CRNG to be
fully initialized, in which case we're talking about 16 characters
versus 32 charaters.)

[1] http://sweng.the-davies.net/Home/rustys-api-design-manifesto

- Ted