Re: [RFC] fs: add userspace critical mounts event support

From: Luis R. Rodriguez
Date: Tue Oct 04 2016 - 20:00:25 EST

On Sat, Sep 24, 2016 at 10:41:46AM -0700, Dmitry Torokhov wrote:
> On Fri, Sep 23, 2016 at 6:37 PM, Herbert, Marc <marc.herbert@xxxxxxxxx> wrote:
> > On 03/09/2016 11:10, Dmitry Torokhov wrote:
> >> I was thinking if we kernel could post
> >> "conditions" (maybe simple stings) that it waits for, and userspace
> >> could unlock these "conditions". One of them might be "firmware
> >> available".
> >
> > On idea offered by Josh Triplett that seems to overlap with this one
> > is to have something similar to the (deprecated) userhelper with
> > *per-blob* requests and notifications except for one major difference:
> > userspace would not anymore be in charge of *providing* the blob but
> > would instead only *signal* when a given blob becomes available and is
> > either found or found missing. Then the kernel loads the blob _by
> > itself_; unlike the userhelper. No new âcritical filesystemâ concept
> > and a *per-blob basis*, allowing any variation of blob locations
> > across any number of initramfs and filesystems.
> >
> Really, I do not quite understand why people have issues with usermode
> helper/uevents.

One reason is you'd have to implement your own cache for suspend/resume.

> It used to work reasonably well (if you were using
> request_firmware_nowait()), as the kernel would post the request and
> then, when userspace was ready[^Hier], uevents would be processed and
> firmware would be loaded. We had a timeout of 60(?) seconds by
> default, but that would be adjusted as systems needed.

The issue with the timeout was kernel developers *assumed* module init
and probe were detached, and saying 'thou shall not load firmware on
probe' seems actually like a more radical change than just saying
'thou shall load firmware on init'. I'll note that as it stands
its the right thing to complain about these users only because we
lack the semantics to ensure correctness if used on init or probe.
The timeout incurred huge latencies for optional firmwares, and
while we had a new API added to avoid the wait on optional firmware,
that obviously still leaved the races as possible. We now have async
probe which *does* enable some original misconceptions by kernel
developers, but by now other issues have also been found on the
usermode helper, the cache was one, another one was a recent discusion
over the user of the UMH lock with the assumption this was providing
a sort of safeguard on early boot use -- it does not, for the same
exact reasons why a UMH lock does not suffice to avoid all possible
rootfs races. For this later issue refer to a recent discussion in
review with Daniel Wagner's patches.

> Unfortunately it all broke when udev started insisting [1] on
> servicing some uevents in strict sequence, which resulted in boot
> stalls.

That was not the only issue... another implicit issue was that
you are reducing the number of possible supported number of
devices Linux supports per module by the timeout, it would
depend on the combine time it takes to both init and probe.
Some drivers are super complex and even if you *don't* have
firmware requirements and say burn the firmware onto a device
we found that *probe* alone was taking a long long time on some
device drivers -- check out cxgb4 driver, where one device actually
ends up loading about 4 subdevices underneath it. Yes that's a mess
and the driver needs a major rewrite to address this in a clean way
but that takes time. Its no trivial pursuit. The umh timeout then
would not be implicated anymore *but* since systemd implemented the
timeout in general for kmod loading it did mean system was limiting
them Linux drivers and how much devices a driver can support
depending on this timeout value. At SUSE we solved this by lifting
this timeout for kmod workers for now. A long term goal here, which
could help, is also to just detach init and probes, so we give to
system what it originally thought. Summary of this all is here:

I have some code that starts to enable some of this on systemd/kmod
but it still needs some more testing before I post.

> Maybe the ultimate answer is to write a firmware loading
> daemon that would also listen to netlink events and do properly what
> udev refused to be doing?

Meh, in the wireless subsystem we devised our own file loader,
check CRDA. That worked for us since we needed to optionally
enable digital RSA signed file checking, but long term our
experience is that this is pointless. So we're going to phase
that out in favor of using the firmware API for the file loading
of this file, and support then digital signatures on the firmware.

I am not sure how/why a firmware loading daemon would be a better
idea now. What Marc describes that Josh proposed with signals for
userspcae seems more aligned with what we likely need -- but note
that since we now use a shared common API for kernel reads from a
path via kernel_read_file_from_path() we'd probably want something
like a notifier for any kernel_read_file_from_path() user. The ability
for the kernel to register a generic userspace notifier seems worthy
of consideration, but I'd be surprised if we don't already have
something quite like this already?

> The distribution would know when it is ready
> to service firmware requests (and thus when to start this daemon), and
> we would have the freedom of having drivers both built-in and as
> modules and bulding firmware into kernel, intiramfs or keep on a
> "real" fs available at later time.

What difference would there be if we just used notifications to
guarantee to the kernel the file in question is now available?