Re: kmod: add a sanity check on module loading

From: Jessica Yu
Date: Fri Jan 06 2017 - 16:03:45 EST


+++ Rusty Russell [03/01/17 10:34 +1030]:
"Luis R. Rodriguez" <mcgrof@xxxxxxxxxx> writes:
Maybe a similar hack for try_then_request_module(), but many places seem
to open-code request_module() so it's not as trivial...

Hi Luis, Jessica (who is the main module maintainer now),

Back from break, sorry about delay.

Right, out of ~350 request_module() calls (not included try requests)
only ~46 check the return value. Hence a validation check, and come to
think of it, *this* was the issue that originally had me believing
that in some places we might end up in a null deref --if those open
coded request_module() calls assume the driver is loaded there could
be many places where a NULL is inevitable.

Yes, assuming success == module loade is simply a bug. I wrote
try_then_request_module() to attempt to encapsulate the correct logic
into a single place; maybe we need other helpers to cover (most of?) the
remaining cases?

Granted, I agree they
should be fixed, we could add a grammar rule to start nagging at
driver developers for started, but it does beg the question also of
what a tightly knit validation for modprobe might look like, and hence
this patch and now the completed not-yet-posted alias work.

I really think aliases-in-kernel is too heavy a hammer, but a warning
when modprobe "succeeds" and the module still isn't found would be
a Good Thing.

I was under the impression that aliases were a userspace concern. i.e., we let
kmod tools take care of alias resolution and bookkeeping. I'm getting the
feeling we're bending over backwards here to accommodate buggy/untrustworthy
userspace (modprobe). If I understand correctly, we're performing this
validation work - we're proposing to make the kernel alias-aware - because we
can't even trust modprobe's return value, and the proposal is to double check
this work ourselves in-kernel.

But I thought that request_module() wasn't written to provide these "module is
now live and loaded" guarantees in the first place. This seems to be documented
in kernel/kmod.c - "Callers must check that the service they requested is now
available not blindly invoke it." Isn't it the caller's responsibility to
(indirectly) validate request_module's work, to check that the service they want is
now there? If a caller doesn't do this, then this is a bug on their side. If it
is crucial for get_fs_type() to not fail, then perhaps we should be tightening
get_fs_type() instead, be that WARNing if the requested filesystem is still not
there (as suggested earlier), or maybe even trying the request again.

Would it be worthy as a kconfig kmod debugging aide for now? I can
follow up with a semantic patch to nag about checking the return value
of request_module(), and we can have 0-day then also complain about
new invalid uses.

Yeah, a warning about this would be win for sure.

BTW, I wrote the original "check-for-module-before-loading" in
module-init-tools, but I'm starting to wonder if it was a premature
optimization. Have you thought about simply removing it and always
trying to load the module? If it doesn't slow things down, perhaps
simplicity FTW?

Thanks,
Rusty.