Re: Linux 5.3-rc8

From: Linus Torvalds
Date: Sun Sep 15 2019 - 13:02:42 EST

On Sat, Sep 14, 2019 at 11:56 PM Lennart Poettering
<mzxreary@xxxxxxxxxxx> wrote:
> I am not expecting the kernel to guarantee entropy. I just expecting
> the kernel to not give me garbage knowingly. It's OK if it gives me
> garbage unknowingly, but I have a problem if it gives me trash all the
> time.

So realistically, we never actually give you *garbage*.

It's just that we try very hard to actually give you some entropy
guarantees, and that we can't always do in a timely manner -
particularly if you don't help.

But on a PC, we can _almost_ guarantee entropy. Even with a golden
image, we do mix in:

- timestamp counter on every device interrupt (but "device interrupt"
doesn't include things like the local CPU timer, so it really needs
device activity)

- random boot and BIOS memory (dmi tables, the EFI RNG entry, etc)

- various device state (things like MAC addresses when registering
network devices, USB device numbers, etc)

- and obviously any CPU rdrand data

and note the "mix in" part - it's all designed so that you don't trust
any of this for randomness on its own, but very much hopefully it
means that almost *any* differences in boot environment will add a
fair amount of unpredictable behavior.

But also note the "on a PC" part.

Also note that as far as the kernel is concerned, none of the above
counts as "entropy" for us, except to a very small degree the device
interrupt timing thing. But you need hundreds of interrupts for that
to be considered really sufficient.

And that's why things broke. It turns out that making ext4 be more
efficient at boot caused fewer disk interrupts, and now we weren't
convinced we had sufficient entropy. And the systemd boot thing just
*stopped* waiting for entropy to magically appear, which is never will
if the machine is idle and not doing anything.

So do we give you "garbage" in getrandom()? We try really really hard
not to, but it's exactly the "can we _guarantee_ that it has entropy"
that ends up being the problem.

So if some silly early boot process comes along, and asks for "true
randomness", and just blocks for it without doing anything else,
that's broken from a kernel perspective.

In practice, the only situation we have had really big problems with
not giving "garbage" isn't actually the "golden distro image" case you
talk about. It's the "embedded device golden _system_ image" case,
where the image isn't just the distribution, but the full bootloader

Some cheap embedded MIPS CPU without even a timestamp counter, with
identical flash contents for millions of devices, and doing a "on
first boot, generate a long-term key" without even connecting to the
network first.

That's the thing Ted was pointing at:

so yes, it can be "garbage", but it can be garbage only if you really
really do things entirely wrong.

But basically, you should never *ever* try to generate some long-lived
key and then just wait for it without doing anything else. The
"without doing anything else" is key here.

But every time we've had a blocking interface, that's exactly what
somebody has done. Which is why I consider that long blocking thing to
be completely unacceptable. There is no reason to believe that the
wait will ever end, partly exactly because we don't consider timer
interrupts to add any timer randomness. So if you are just waiting,
nothing necessarily ever happen.