[PATCH 1/1] user namespaces: let user_ns be cloned with fairsched

From: Serge E . Hallyn
Date: Mon Dec 08 2008 - 10:24:33 EST


fairsched creates a per-uid directory under /sys/kernel/uids/.
So when you clone(CLONE_NEWUSER), it tries to create
/sys/kernel/uids/0, which already exists, and you get back
-ENOMEM.

This was supposed to be fixed by sysfs tagging, but that
was postponed (ok, rejected until sysfs locking is fixed).
So, just as with network namespaces, we just don't create
those directories for user namespaces other than the init.

Changelog:
Dec 8 2008: Documented the currently bogus state of
support for user groups with user namespaces. In
particular, all users in a user namespace should be
children of the user which created the user namespace.
This is yet to be unimplemented.

Signed-off-by: Serge E. Hallyn <serue@xxxxxxxxxx>
Acked-by: Dhaval Giani <dhaval@xxxxxxxxxxxxxxxxxx>
---
Documentation/scheduler/sched-design-CFS.txt | 21 +++++++++++++++++++++
kernel/user.c | 12 +++++++++++-
2 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/Documentation/scheduler/sched-design-CFS.txt b/Documentation/scheduler/sched-design-CFS.txt
index eb471c7..8398ca4 100644
--- a/Documentation/scheduler/sched-design-CFS.txt
+++ b/Documentation/scheduler/sched-design-CFS.txt
@@ -273,3 +273,24 @@ task groups and modify their CPU share using the "cgroups" pseudo filesystem.

# #Launch gmplayer (or your favourite movie player)
# echo <movie_player_pid> > multimedia/tasks
+
+8. Implementation note: user namespaces
+
+User namespaces are intended to be hierarchical. But they are currently
+only partially implemented. Each of those has ramifications for CFS.
+
+First, since user namespaces are hierarchical, the /sys/kernel/uids
+presentation is inadequate. Eventually we will likely want to use sysfs
+tagging to provide private views of /sys/kernel/uids within each user
+namespace.
+
+Second, the hierarchical nature is intended to support completely
+unprivileged use of user namespaces. So if using user groups, then
+we want the users in a user namespace to be children of the user
+who created it.
+
+That is currently unimplemented. So instead, every user in a new
+user namespace will receive 1024 shares just like any user in the
+initial user namespace. Note that at the moment creation of a new
+user namespace requires each of CAP_SYS_ADMIN, CAP_SETUID, and
+CAP_SETGID.
diff --git a/kernel/user.c b/kernel/user.c
index 97202cb..6608a3d 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -239,13 +239,21 @@ static struct kobj_type uids_ktype = {
.release = uids_release,
};

-/* create /sys/kernel/uids/<uid>/cpu_share file for this user */
+/*
+ * Create /sys/kernel/uids/<uid>/cpu_share file for this user
+ * We do not create this file for users in a user namespace (until
+ * sysfs tagging is implemented).
+ *
+ * See Documentation/scheduler/sched-design-CFS.txt for ramifications.
+ */
static int uids_user_create(struct user_struct *up)
{
struct kobject *kobj = &up->kobj;
int error;

memset(kobj, 0, sizeof(struct kobject));
+ if (up->user_ns != &init_user_ns)
+ return 0;
kobj->kset = uids_kset;
error = kobject_init_and_add(kobj, &uids_ktype, NULL, "%d", up->uid);
if (error) {
@@ -281,6 +289,8 @@ static void remove_user_sysfs_dir(struct work_struct *w)
unsigned long flags;
int remove_user = 0;

+ if (up->user_ns != &init_user_ns)
+ return;
/* Make uid_hash_remove() + sysfs_remove_file() + kobject_del()
* atomic.
*/
--
1.5.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/