Re: [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized

From: Linus Torvalds
Date: Sun Sep 15 2019 - 15:32:04 EST


On Sun, Sep 15, 2019 at 12:18 PM Willy Tarreau <w@xxxxxx> wrote:
>
> Oh no I definitely don't want this behavior at all for urandom, what
> I'm saying is that as long as getrandom() will have a lower quality
> of service than /dev/urandom for non-important randoms

Ahh, here you're talking about the fact that it can block at all being
"lower quality".

I do agree that getrandom() is doing some odd things. It has the
"total blocking mode" of /dev/random (if you pass it GRND_RANDOM), but
it has no mode of replacing /dev/urandom.

So if you want the /dev/urandom bvehavior, then no, getrandom() simply
has never given you that.

Use /dev/urandom if you want that.

Sad, but there it is. We could have a new flag (GRND_URANDOM) that
actually gives the /dev/urandom behavior. But the ostensible reason
for getrandom() was the blocking for entropy. See commit c6e9d6f38894
("random: introduce getrandom(2) system call") from back in 2014.

The fact that it took five years to hit this problem is probably due
to two reasons:

(a) we're actually pretty good about initializing the entropy pool
fairly quickly most of the time

(b) people who started using 'getrandom()' and hit this issue
presumably then backed away from it slowly and just used /dev/urandom
instead.

So it needed an actual "oops, we don't get as much entropy from the
filesystem accesses" situation to actually turn into a problem. And
presumably the people who tried out things like nvdimm filesystems
never used Arch, and never used a sufficiently new systemd to see the
"oh, without disk interrupts you don't get enough randomness to boot".

One option is to just say that GRND_URANDOM is the default (ie never
block, do the one-liner log entry to warn) and add a _new_ flag that
says "block for entropy". But if we do that, then I seriously think
that the new behavior should have that timeout limiter.

For 5.3, I'll just revert the ext4 change, stupid as that is. That
avoids the regression, even if it doesn't avoid the fundamental
problem. And gives us time to discuss it.

Linus