Re: Linux 5.3-rc8

From: Theodore Y. Ts'o
Date: Sun Sep 15 2019 - 22:49:39 EST

On Sun, Sep 15, 2019 at 06:48:34PM -0700, Vito Caputo wrote:
> > A small note here, especially after I've just read the commit log of
> > 72dbcf721566 ('Revert ext4: "make __ext4_get_inode_loc plug"'), which
> > unfairly blames systemd there.
> > What blocked the system boot was GDM/gnome-session implicitly calling
> > getrandom() for the Xorg MIT cookie. This was shown in the strace log
> > below:
> >
> > 20190910173243.GA3992@darwi-home-pc">

Yes, that's correct, this isn't really systemd's fault. It's a
combination of GDM/gnome-session stupidly using MIT Magic Cookie at
*all* (it was a bad idea 30 years ago, and it's a bad idea in 2019),
GDM/gnome-session using getrandom(2) at all; it should have just stuck
with /dev/urandom, or heck just used random_r(3) since when we're
talking about MIT Magic Cookie, there's no real security *anyway*.

It's also a combination of the hardware used by this particular user,
the init scripts in use that were probably not generating enough read
requests compared to other distributions (ironically, distributions
and init systems that try the hardest to accelerate the boot make this
problem worse by reducing the entropy that can be harvested from I/O).
And then when we optimzied ext4 so it would be more efficient, that
tipped this particular user over the edge.

Linus might not have liked my proposal to disable the optimization if
the CRNG isn't optimized, but ultimately this problem *has* gotten
worse because we've optimized things more. So to the extent that
systemd has made systems boot faster, you could call that systemd's
"fault" --- just as Linus reverting ext4's performance optimization is
ssaying that it's ext4 "fault" because we had the temerity to try to
make the file system be more efficient, and hence, reduce entropy that
can be collected.

Ultimately, though, the person who touches this last is whose "fault"
it is. And the problem is because it really is a no-win situation
here. No matter *what* we do, it's going to either (a) make some
systems insecure, or (b) make some systems more likely hang while
booting. Whether you consider the risk of (a) or (b) to be worse is
ultimately going to cause you to say that people of the contrary
opinion are either "being reckless with system security", or
"incompetent at system design".

And really, it's all going to depend on how the Linux kernel is being
used. The fact that Linux is being used in IOT devices, mobile
handsets, desktops, servers running in VM's, user desktops, etc.,
means that there will be some situations where blocking is going to be
terrible, and some situations where a failure to provide system
security could result in risking someone's life, health, or mission
failure in some critical system.

That's why this discussion can easily get toxic. If you are only
focusing on one part of Linux market, then obviously *you* are the
only sane one, and everyone *else* who disagrees with you must be
incompetent. When, perhaps, they may simply be focusing on a
different part of the ecosystem where Linux is used.

> So did systemd-random-seed instead drain what little entropy there was
> before GDM started, increasing the likelihood a subsequent getrandom()
> call would block?

No. Getrandom(2) uses the new CRNG, which is either initialized, or
it's not. Once it's initialized, it won't block again ever.

- Ted