Re: [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized

From: Ahmed S. Darwish
Date: Sun Sep 15 2019 - 22:46:09 EST


On Sun, Sep 15, 2019 at 11:59:41AM -0700, Linus Torvalds wrote:
> On Sun, Sep 15, 2019 at 11:32 AM Willy Tarreau <w@xxxxxx> wrote:
> >
> > I think that the exponential decay will either not be used or
> > be totally used, so in practice you'll always end up with 0 or
> > 30s depending on the entropy situation
>
> According to the systemd random-seed source snippet that Ahmed posted,
> it actually just tries once (well, first once non-blocking, then once
> blocking) and then falls back to reading urandom if it fails.
>
> So assuming there's just one of those "read much too early" cases, I
> think it actually matters.
>

Just a quick note, the snippest I posted:

20190914150206.GA2270@darwi-home-pc">https://lkml.kernel.org/r/20190914150206.GA2270@darwi-home-pc

is not PID 1.

It's just a lowly process called "systemd-random-seed". Its main
reason of existence is to load/restore a random seed file from and to
disk across reboots (just like what sysv scripts did).

The reason I posted it was to show that if we change getrandom() to
silently return weak crypto instead of blocking or an error code,
systemd-random-seed will break: it will save the resulting data to
disk, then even _credit_ it (if asked to) in the next boot cycle
through RNDADDENTROPY.

> But while I tried to test this, on my F30 install, systemd seems to
> always just use urandom().
>
> I can trigger the urandom read warning easily enough (turn of CPU
> rdrand trusting and increase the entropy requirement by a factor of
> ten, and turn of the ioctl to add entropy from user space), just not
> the getrandom() blocking case at all.
>

Yeah, because the problem was/is not with systemd :)

It is GDM/gnome-session which was blocking the graphical boot process.

Regarding reproducing the issue, through a quick trace_prink, all of
below processes are calling getrandom() on my Arch system at boot:

20190912034421.GA2085@darwi-home-pc">https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc

The fatal call was gnome-session's one, because gnome didn't continue
_its own_ boot due to this blockage.

> So presumably that's because I have a systemd that doesn't use
> getrandom() at all, or perhaps uses the 'rdrand' instruction directly.
> Or maybe because Arch has some other oddity that just triggers the
> problem.
>

It seems Arch is good at triggering this. For example, here is a
another Arch user on a Thinkpad (different model than mine), also with
GDM getting blocked on entropy:

https://bbs.archlinux.org/viewtopic.php?id=248035

"As you can see, the system is literally waiting a half minute for
something - up until crng init is done"

(The NetworkManager logs are just noise. I also had them, but completely
disabling NetworkManager didn't do anything .. just made the logs
cleaner)

thanks,

--
Ahmed Darwish
http://darwish.chasingpointers.com