Re: [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized

From: Willy Tarreau
Date: Sun Sep 15 2019 - 15:55:26 EST


On Sun, Sep 15, 2019 at 12:31:42PM -0700, Linus Torvalds wrote:
> On Sun, Sep 15, 2019 at 12:18 PM Willy Tarreau <w@xxxxxx> wrote:
> >
> > Oh no I definitely don't want this behavior at all for urandom, what
> > I'm saying is that as long as getrandom() will have a lower quality
> > of service than /dev/urandom for non-important randoms
>
> Ahh, here you're talking about the fact that it can block at all being
> "lower quality".
>
> I do agree that getrandom() is doing some odd things. It has the
> "total blocking mode" of /dev/random (if you pass it GRND_RANDOM), but
> it has no mode of replacing /dev/urandom.

Yep but with your change it's getting better.

> So if you want the /dev/urandom bvehavior, then no, getrandom() simply
> has never given you that.
>
> Use /dev/urandom if you want that.

It's not available in chroot, which is the main driver for getrandom()
I guess.

> Sad, but there it is. We could have a new flag (GRND_URANDOM) that
> actually gives the /dev/urandom behavior. But the ostensible reason
> for getrandom() was the blocking for entropy. See commit c6e9d6f38894
> ("random: introduce getrandom(2) system call") from back in 2014.

Oh I definitely know it's been a long debate.

> The fact that it took five years to hit this problem is probably due
> to two reasons:
>
> (a) we're actually pretty good about initializing the entropy pool
> fairly quickly most of the time
>
> (b) people who started using 'getrandom()' and hit this issue
> presumably then backed away from it slowly and just used /dev/urandom
> instead.

We've hit it the hard way more than a year ago already, when openssl
adopted getrandom() instead of urandom for certain low-importance
things in order to work better in chroots and/or avoid fd leaks. And
even openssl had to work around these issues in multiple iterations
(I don't remember how however).

> So it needed an actual "oops, we don't get as much entropy from the
> filesystem accesses" situation to actually turn into a problem. And
> presumably the people who tried out things like nvdimm filesystems
> never used Arch, and never used a sufficiently new systemd to see the
> "oh, without disk interrupts you don't get enough randomness to boot".

In my case the whole system is in the initramfs and the only accesses
to the flash are to read the config. So that's pretty a limited source
of interrupts for a headless system ;-)

> One option is to just say that GRND_URANDOM is the default (ie never
> block, do the one-liner log entry to warn) and add a _new_ flag that
> says "block for entropy". But if we do that, then I seriously think
> that the new behavior should have that timeout limiter.

I think the timeout is a good thing to do, but it would be nice to
let the application know that what was provided was probably not as
good as expected (well if the application wants real random, it
should use GRND_RANDOM).

> For 5.3, I'll just revert the ext4 change, stupid as that is. That
> avoids the regression, even if it doesn't avoid the fundamental
> problem. And gives us time to discuss it.

It's sad to see that being excessive on randomness leads to forcing
totally unrelated subsystem to be less efficient :-(

Willy