Re: [RFC 10/10] kmod: add a sanity check on module loading

From: Luis R. Rodriguez
Date: Mon Jan 09 2017 - 14:56:57 EST


On Tue, Jan 10, 2017 at 05:17:22AM +1030, Rusty Russell wrote:
> "Luis R. Rodriguez" <mcgrof@xxxxxxxxxx> writes:
> > On Tue, Jan 03, 2017 at 10:34:53AM +1030, Rusty Russell wrote:
> >> "Luis R. Rodriguez" <mcgrof@xxxxxxxxxx> writes:
> >> > Right, out of ~350 request_module() calls (not included try requests)
> >> > only ~46 check the return value. Hence a validation check, and come to
> >> > think of it, *this* was the issue that originally had me believing
> >> > that in some places we might end up in a null deref --if those open
> >> > coded request_module() calls assume the driver is loaded there could
> >> > be many places where a NULL is inevitable.
> >>
> >> Yes, assuming success == module loade is simply a bug. I wrote
> >> try_then_request_module() to attempt to encapsulate the correct logic
> >> into a single place; maybe we need other helpers to cover (most of?) the
> >> remaining cases?
> >
> > I see...
> >
> > OK so indeed we have a few possible changes to kernel given the above:
> >
> > a) Add SmPL rule to nag about incorrect uses of request_module() which
> > never check for the return value, and fix 86% of calls (304 call sites)
> > which are buggy
>
> Well, checking the return value is merely an optimization. The bug
> is not re-checking for registrations and *assuming existence*.

An optimization, I see.. I was going with using the return value from
request_module() -- clearly that is not proper form. Might as well make this
void ?

> I glanced through the first 100, and they're fine. You are supposed to
> do "request_module()" then "re-check if it's there", and that seems to
> the pattern.

OK I then understand now why you added try_then_request_module() and hinted
to more similar forms. If try_then_request_module() was capturing the
required effort properly then I will note that my grammar rule now finds
one invalid use on drivers/media/usb/as102/as102_drv.c, although its
use seems invalid though the module is loaded for firmware loading
purposes and it seems that is optional at that point in time as such
it does not seem invalid. Not sure if its worth adding a separate API
call for this to annotate its fine to ignore the return value.

One thing I am sure of at this point though is that the loose required
checks for proper form makes it pretty hard to validate the callers.

> > b) Add a new API call, perhaps request_module_assert() which would
> > BUG_ON() if the requested module didn't load, and change the callers
> > which do not check for the return value to this.
> >
> > Make request_module() do the assert and changing all proper callers of
> > request_module() to a new API call which *does* let you check for the
> > return value is another option but tasteless.
> >
> > b) seems to be what you allude to, and while it may seem also of bad taste,
> > in practice it may be hard to get callers to properly check for the return
> > value. I actually just favor a) even though its more work.
>
> No, I meant to look for patterns to see if we could create helpers. But
> I've revised that, since I don't actually see any problems.
>
> In fact, you've yet to identify a single problem user.

Indeed, its actually hard to verify proper "form" for this API given
how different each caller verifies the requested module is present or
loaded.

> >> BTW, I wrote the original "check-for-module-before-loading" in
> >> module-init-tools, but I'm starting to wonder if it was a premature
> >> optimization. Have you thought about simply removing it and always
> >> trying to load the module? If it doesn't slow things down, perhaps
> >> simplicity FTW?
> >
> > I've given this some thought as I tried to blow up request_module() with
> > the new kmod stress test driver and given the small changes I made -- I'm of the
> > mind set it should be based on numbers: if a change improves the time it takes
> > to load modules while also not regressing all the other test cases then we
> > should go with it. The only issue is we don't yet have enough test cases
> > to cover the typical distribution setup: load tons of modules, and only
> > sometimes try to load a few of the same modules.
>
> Just benchmark boot time. That's a pretty good test.

Alright.

Luis