Re: user ns: arbitrary module loading

From: Eric W. Biederman
Date: Sun Mar 03 2013 - 16:58:40 EST


Kees Cook <keescook@xxxxxxxxxx> writes:

> On Sat, Mar 2, 2013 at 8:12 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>> "Serge E. Hallyn" <serge@xxxxxxxxxx> writes:
>>
>>> Quoting Kees Cook (keescook@xxxxxxxxxx):
>>>> The rearranging done for user ns has resulted in allowing arbitrary
>>>> kernel module loading[1] (i.e. re-introducing a form of CVE-2011-1019)
>>>> by what is assumed to be an unprivileged process.
>>>>
>>>> At present, it does look to require at least CAP_SETUID along the way
>>>> to set up the uidmap (but things like the setuid helper newuidmap
>>>> might soon start providing such a thing by default).
>>
>> CAP_SETUID is not needed.
>
> Do you have an example? I wasn't able to gain capabilities within the
> userns until I had a uid map set that allowed my uid to map to root.
> (I needed CAP_SYS_ADMIN in the mnt ns to get past may_mount() before I
> could touch get_fs_type().)

Likely because exec drops capbilities if your uid isn't 0.

Without CAP_SETUID you can still map uid to 0 to your current uid.

The following shell script is an easy basis for testing and playing
around with things. It does assume a recent version of unshare from util-linux.

#!/bin/sh
export IFIFO=/tmp/userns-test-$$-in
export OFIFO=/tmp/userns-test-$$-out
rm -f $IFIFO $OFIFO
mkfifo $IFIFO
mkfifo $OFIFO
unshare --user -- /bin/bash -s <<'EOF' &
echo waiting-for-uid-and-gid-maps > $OFIFO
read LINE < $IFIFO
exec unshare --mount --net -- /bin/sh -s <<'EOF2'
mount --bind $HOME /root/
mount --bind /dev/null /dev/log
# Start a shell to keep the namespace reference alive
$SHELL -i < /dev/tty > /dev/tty 2> /dev/tty
EOF2
EOF
child=$!
read LINE < $OFIFO
uid=$(id --user)
gid=$(id --group)
echo "0 $uid 1" > /proc/$child/uid_map
echo "0 $gid 1" > /proc/$child/gid_map
echo uid-and-gid-maps > $IFIFO
wait $child


>> At a practical level I don't see adding a capalbe(CAP_SYS_ADMIN) check
>> as having much effect for the functionality currently present in user
>> namespaces today as the filesystems that an legal to mount in a user
>> namespace (ramfs, tmpfs, mqueuefs, sysfs, proc, devpts) are so common
>> most of them can not even be built as modules and even if they are
>> modules the modules will already be loaded. So I will see about adding
>> a capable(CAP_SYS_ADMIN) check to shore things up for the short term.
>
> For the short-term, yes, this solves the "regression". But it
> obviously isn't what is desired for userns in the long term.


>> So I am wondering what I a good policy should be. Should we trust
>> kernel modules to not be buggy (especially if they were signed as part
>> of the build process)? Do we add some defense in depth and add
>> filesystem registration magic? Thinking...
>
> If we can produce a mechanism that provides some defensive design, we
> should do it. There will always be bugs, so we should always try to
> make them harder to get to or harder to exploit. Reducing the
> available attack surface to "just filesystems" would be a win here,
> and it would be consistent with the _intent_ of existing kernel code.

Agreed.

>> We can limit the request_module in get_fs_type to just filesystems
>> fairly easily.
>>
>> In include/linux/fs.h:
>>
>> #define MODULE_ALIAS_FS(type) MODULE_ALIAS("fs-" __stringify(type))
>>
>> In fs/filesystems.c:
>>
>> if (request_moudle("fs-%.*s", len, name) == 0)
>>
>> Then just add the appropriate MODULE_ALIAS_FS lines in all of the
>> filesystems. This also allows user space to say set the module loading
>> policy for filesystems using the blacklist and the alias keywords
>> in /etc/modprobe.d/*.conf.
>
> This was the solution for netdev. The backward compat situation is this:
>
> if (no_module && capable(CAP_NET_ADMIN))
> no_module = request_module("netdev-%s", name);
> if (no_module && capable(CAP_SYS_MODULE)) {
> if (!request_module("%s", name))
> pr_warn("Loading kernel module for a network
> device with CAP_SYS_MODULE (deprecated). Use CAP_NET_ADMIN and alias
> netdev-%s instead.\n",
> name);
> }

If I have read it right the backwards compat case is there to be
compatibile with existing peoples configuration who had created specific
aliases like eth0 for specific network devices in
/etc/modprobe.d/*.conf. Where the new netdev-* would break userspace.

I don't believe anyone does anything like that for filesystems, and even
if they tried it wouldn't work because the registered filesystem type
would have a different name. So I don't see needing to worry about that
kind of backwards compatibility for filesystems.

> Those aren't ns_capable, though, so right now userns can't trigger
> loading new network drivers via ifconfig, but this seems like a
> reasonable approach to take.

Yes. I saw those and deliberately didn't make those ns_capable, to be
on the safe side when I made the rest of the networking ns_capable.
That and ns_capabable is not meaningful as a check for loading modules.
The only interesting question is can the global root user trigger them
or can everyone trigger module loading.

>> That seems a whole lot simpler, more powerful and more maintainable than
>> what little I saw in GRKERNSEC_MODHARDEN to prevent loading of
>> non-filesystem modules from get_fs_type.
>>
>> Eric
>>
>> p.s. This is the patch I am looking at pushing to Linus in the near
>> future.
>>
>> diff --git a/fs/filesystems.c b/fs/filesystems.c
>> index da165f6..5b0644d 100644
>> --- a/fs/filesystems.c
>> +++ b/fs/filesystems.c
>> @@ -273,7 +273,8 @@ struct file_system_type *get_fs_type(const char *name)
>> int len = dot ? dot - name : strlen(name);
>>
>> fs = __get_fs_type(name, len);
>> - if (!fs && (request_module("%.*s", len, name) == 0))
>> + if (!fs && capable(CAP_SYS_ADMIN) &&
>> + (request_module("%.*s", len, name) == 0))
>> fs = __get_fs_type(name, len);
>>
>> if (dot && fs && !(fs->fs_flags & FS_HAS_SUBTYPE)) {
>
> Will this break other users of get_fs_type()? I think it's okay; it
> looks like all the other users are expected to be privileged.

I read through them all, and I didn't see problems. The most concerning
one is tomoyo's use of get_fs_type before capabilities are checked in
do_mount. But that is neither here nor there.

> Should this look like netdev in the future? Something like this, after
> making may_mount() available to fs/filesystems.c:
>
> - if (!fs && (request_module("%.*s", len, name) == 0))
> + if (!fs && ((may_mount() && request_module("fs-%.*s", len,
> name) == 0) ||
> + (capable(CAP_SYS_ADMIN) && request_module("%.*s",
> len, name) == 0)))
>
> and adding all the filesystem module aliases.
>
> At some point maybe we can add a flag to kill the old unqualified
> paths available to CAP_SYS_ADMIN for netdev and mount...

For netdev at least I don't see it as being particularly interesting.
The case for tunnels is already as unprivileged as you can reasonably
get with "request_module("rtnl-link-%s", kind);" in
net/core/rtnetlink.c:rtnl_newlink(). For real physical devices there is
both greater chance of a buggy module and no realy need as udev will
load the module based on hardware auto-discovery.

This leads to the fundamental question: Should we require privilege to
request the load filesystem modules?

I have looked at GRKERNSEC_MODHARDEN to see if that could give me some
guidance. Unfortunately GRKERNSEC_MODHARDEN takes the position that fs
kernel modules are the only kernel modules that should ever auto-load
and only in very specific situations. So I can't see that being the
normal kernel policy, especially since there are the sysctls
/proc/sys/kernel/modprobe and /proc/sys/kernel/modules_disabled.

Overall the basic policy building blocks for controlling which modules
are loaded seem solid and in use. So I don't see any particular reason
why the kernel's default policy should not be to allow any users actions
to request modules.

So I think I am going to scrap the change sitting in my development tree
to require capalbe(CAP_SYS_ADMIN) to load a filesystem module and just
go with my request_module("fs-%.*",...); change. That is simple and
seems to match the rest of the kernel.

Does anyone see a reason why we should need CAP_SYS_ADMIN or be in the
initial user namespace to trigger a request of filesystem modules?

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/