Re: [RFC] Expose request_module via syscall
From: Andy Lutomirski
Date: Wed Sep 22 2021 - 11:34:49 EST
On Wed, Sep 22, 2021, at 5:25 AM, Christian Brauner wrote:
> On Mon, Sep 20, 2021 at 11:36:47AM -0700, Andy Lutomirski wrote:
>> On Mon, Sep 20, 2021 at 11:16 AM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
>> >
>> > On Mon, Sep 20, 2021 at 04:51:19PM +0200, Thomas Weißschuh wrote:
>>
>> > > > Do you mean it literally invokes /sbin/modprobe? If so, hooking this
>> > > > at /sbin/modprobe and calling out to the container manager seems like
>> > > > a decent solution.
>> > >
>> > > Yes it does. Thanks for the idea, I'll see how this works out.
>> >
>> > Would documentation guiding you in that way have helped? If so
>> > I welcome a patch that does just that.
>>
>> If someone wants to make this classy, we should probably have the
>> container counterpart of a standardized paravirt interface. There
>> should be a way for a container to, in a runtime-agnostic way, issue
>> requests to its manager, and requesting a module by (name, Linux
>> kernel version for which that name makes sense) seems like an
>> excellent use of such an interface.
>
> I always thought of this in two ways we currently do this:
>
> 1. Caller transparent container manager requests.
> This is the seccomp notifier where we transparently handle syscalls
> including intercepting init_module() where we parse out the module to
> be loaded from the syscall args of the container and if it is
> allow-listed load it for the container otherwise continue the syscall
> letting it fail or failing directly through seccomp return value.
Specific problems here include aliases and dependencies. My modules.alias file, for example, has:
alias net-pf-16-proto-16-family-wireguard wireguard
If I do modprobe net-pf-16-proto-16-family-wireguard, modprobe parses some files in /lib/modules/`uname -r` and issues init_module() asking for 'wireguard'. So hooking init_module() is at the wrong layer -- for that to work, the container's /sbin/modprobe needs to already have figured out that the desired module is wireguard and have a .ko for it.
>
> 2. A process in the container explicitly calling out to the container
> manager.
> One example how this happens is systemd-nspawn via dbus messages
> between systemd in the container and systemd outside the container to
> e.g. allocate a new terminal in the container (kinda insecure but
> that's another issue) or other stuff.
>
> So what was your idea: would it be like a device file that could be
> exposed to the container where it writes requestes to the container
> manager? What would be the advantage to just standardizing a socket
> protocol which is what we do for example (it doesn't do module loading
> of course as we handle that differently):
My idea is standardizing *something*. I think it would be nice if, for example, distros could ship a /sbin/modprobe that would do the right thing inside any compliant container runtime as well as when running outside a container.
I suppose container managers could also bind-mount over /sbin/modprobe, but that's more intrusive.