Re: [GIT PULL] user namespace and namespace infrastructure changesfor 3.8
From: Andy Lutomirski
Date: Thu Dec 13 2012 - 14:23:56 EST
On 12/11/2012 01:17 PM, Eric W. Biederman wrote:
>
> Linus,
>
> Please pull the for-linus git tree from:
>
> git://git.kernel.org:/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus
>
> HEAD: 98f842e675f96ffac96e6c50315790912b2812be proc: Usable inode numbers for the namespace file descriptors.
>
> This tree is against v3.7-rc3
You've just allowed unprivileged users to create new pid namespaces,
etc, by creating a new userns, then creating a new pid namespace inside
that userns, then setns-ing from outside the userns into the pid ns. Is
this intentional? (The mount ns is okay -- it checks for CAP_CHROOT on
setns.)
In user_namespace.c:
/* Threaded many not enter a different user namespace */
if (atomic_read(¤t->mm->mm_users) > 1)
return -EINVAL;
The comment has a typo. Also, you're checking the wrong condition:
that's whether the vm is shared, not whether the thread group has more
than one member.
In any case, why are threads special here?
I think, although I haven't verified it, that these changes allow
CAP_SYS_ADMIN to bypass the bounding set (and, in particular, to gain
CAP_MODULE): unshare the user namespace and then setfd yourself back. I
think that setns should only grant caps when changing to a descendent
namespace.
Also in userns_install:
796 /* Don't allow gaining capabilities by reentering
797 * the same user namespace.
798 */
799 if (user_ns == current_user_ns())
800 return -EINVAL;
Why? You can trivially bypass this by creating a temporary user ns.
(If you're the owner of your own ns, then you can create a subsidiary
ns, map yourself into it, then setns back -- you'll still be the owner.)
unshare has a bug. This code:
#define _GNU_SOURCE
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <signal.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/prctl.h>
#include <sys/mman.h>
#include <sched.h>
#include <fcntl.h>
static void fail(const char *msg)
{
perror(msg);
exit(1);
}
int main()
{
if (unshare(CLONE_NEWUSER) != 0)
fail("CLONE_NEWUSER");
if (open("/proc/self/uid_map", O_RDWR) == -1)
perror("/proc/self/uid_map O_RDWR");
int fd = open("/proc/self/uid_map", O_RDONLY);
if (fd == -1) {
perror("/proc/self/uid_map O_RDONLY");
} else {
char buf[4096];
ssize_t len = read(fd, buf, sizeof(buf));
if (len > 0)
write(1, buf, len);
else
printf("read uid_map returned %d\n", (int)len);
}
}
produces this output:
/proc/self/uid_map O_RDWR: Permission denied
read uid_map returned 0
With clone instead of unshare, it works. I'm not quite sure what's
going on.
Also, I'm entirely unconvinced that the owner of a userns should
automatically have all caps (in the cap_capable sense) on a userns if
the task is inside that ns. What's wrong with just using normal caps?
(Of course, the fact that caps don't usefully inherit is an issue --
there's a loooong thread going on right now about that.) But doing this
enshrines root-has-caps even farther into ABI. At least please make
this depend on !SECURE_NOROOT.
>
> While small this set of changes is very significant with respect to
> containers in general and user namespaces in particular. The user space
> interface is now complete.
>
> This set of changes adds support for unprivileged users to create user
> namespaces and as a user namespace root to create other namespaces. The
> tyrrany of supporting suid root preventing unprivileged users from using
> cool new kernel features is broken.
>
no_new_privs already (kind of) did that :)
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/