Re: [PATCH 5/5] cgroup: introduce cgroup namespaces

From: Aditya Kali
Date: Mon Jul 21 2014 - 18:11:52 EST

Next message: Andy Lutomirski: "Re: [PATCH] x86, TSC: Add a software TSC offset"
Previous message: Thomas Gleixner: "Re: [PATCH] x86, TSC: Add a software TSC offset"
In reply to: Andy Lutomirski: "Re: [PATCH 5/5] cgroup: introduce cgroup namespaces"
Next in thread: Andy Lutomirski: "Re: [PATCH 5/5] cgroup: introduce cgroup namespaces"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Jul 18, 2014 at 11:57 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Fri, Jul 18, 2014 at 11:51 AM, Aditya Kali <adityakali@xxxxxxxxxx> wrote:
>> On Fri, Jul 18, 2014 at 9:51 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>> On Jul 17, 2014 1:56 PM, "Aditya Kali" <adityakali@xxxxxxxxxx> wrote:
>>>>
>>>> On Thu, Jul 17, 2014 at 12:57 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>> > What happens if someone moves a task in a cgroup namespace outside of
>>>> > the namespace root cgroup?
>>>> >
>>>>
>>>> Attempt to move a task outside of cgroupns root will fail with EPERM.
>>>> This is true irrespective of the privileges of the process attempting
>>>> this. Once cgroupns is created, the task will be confined to the
>>>> cgroup hierarchy under its cgroupns root until it dies.
>>>
>>> Can a task in a non-init userns create a cgroupns? If not, that's
>>> unusual. If so, is it problematic if they can prevent themselves from
>>> being moved?
>>>
>>
>> Currently, only a task with CAP_SYS_ADMIN in the init-userns can
>> create cgroupns. It is stricter than for other namespaces, yes.
>
> I'm slightly hesitant to have unshare(CLONE_NEWUSER |
> CLONE_NEWCGROUPNS | ...) start having weird side effects that are
> visible outside the namespace, especially when those side effects
> don't happen (because the call fails entirely) if
> unshare(CLONE_NEWUSER) happens first. I don't see a real problem with
> it, but it's weird.
>

I expect this to be only in the initial version of the patch. We can
make this consistent with other namespaces once we figure out how
cgroupns can be safely enabled for non-init-userns.

>>
>>> I hate to say it, but it might be worth requiring explicit permission
>>> from the cgroup manager for this. For example, there could be a new
>>> cgroup attribute may_unshare, and any attempt to unshare the cgroup ns
>>> will fail with -EPERM unless the caller is in a may_share=1 cgroup.
>>> may_unshare in a parent cgroup would not give child cgroups the
>>> ability to unshare.
>>>
>>
>> What you suggest can be done. The current patch-set punts the problem
>> of permission checking by only allowing unshare from a
>> capable(CAP_SYS_ADMIN) process. This can be implemented as a follow-up
>> improvement to cgroupns feature if we want to open it to non-init
>> userns.
>>
>> Being said that, I would argue that even if we don't have this
>> explicit permission and relax the check to non-init userns, it should
>> be 'OK' to let ns_capable(current_user_ns(), CAP_SYS_ADMIN) tasks to
>> unshare cgroupns (basically, if you can "create" a cgroup hierarchy,
>> you should probably be allowed to unshare() it).
>
> But non-init-userns tasks can't create cgroup hierarchies, unless I
> misunderstand the current code. And, if they can, I bet I can find
> three or four serious security issues in an hour or two. :)
>

Task running in non-init userns can create cgroup hierarchies if you
chown/chgrp their cgroup root to the task user:

# while running as 'root' (uid=0)
$ cd $CGROUP_MOUNT
$ mkdir -p batchjobs/c_job_id1/

# transfer ownership to the user (in this case 'nobody' (uid=99)).
$ chown nobody batchjobs/c_job_id1/
$ chgrp nobody batchjobs/c_job_id1/
$ ls -ld batchjobs/c_job_id1/
drwxr-xr-x 2 nobody nobody 0 2014-07-21 12:47 batchjobs/c_job_id1/

# enter container cgroup
$ echo 0 > batchjobs/c_job_id1/cgroup.procs

# unshare both userns and cgroupns
$ unshare -u -c
# setup uid_map and gid_map and export user '99' in the userns
# $ cat /proc/<pid>/uid_map
# 0 0 1
# 99 99 1
# $ cat /proc/<pid>/gid_map
# 0 0 1
# 99 99 1
# switch to user 'nobody'
$ su nobody
$ id
uid=99(nobody) gid=99(nobody) groups=99(nobody)

# Now user nobody running under non-init userns can create sub-cgroups
# under "batchjobs/c_job_id1/".
# PWD=$CGROUP_MOUNT/batchjobs/c_job_id1
$ mkdir sub_cgroup1
$ ls -ld sub_cgroup1/
drwxr-xr-x 2 nobody nobody 0 2014-07-21 13:11 sub_cgroup1/
$ echo 0 > sub_cgroup1/cgroup.procs
$ cat /proc/self/cgroup
0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/sub_cgroup1
$ ls -l sub_cgroup1/
total 0
-r--r--r-- 1 nobody nobody 0 2014-07-21 13:11 cgroup.controllers
-r--r--r-- 1 nobody nobody 0 2014-07-21 13:11 cgroup.populated
-rw-r--r-- 1 nobody nobody 0 2014-07-21 13:12 cgroup.procs
-rw-r--r-- 1 nobody nobody 0 2014-07-21 13:11 cgroup.subtree_control

This is a powerful feature as it allows non-root tasks to run
container-management tools and provision their resources properly. But
this makes implementing your suggestion of having 'cgroup.may_unshare'
file tricky as the cgroup owner (task) will be able to set it and
still unshare cgroupns. Instead, may be we could just check if the
task has appropriate (write?) permissions on the cgroup directory
before allowing nested cgroupns creation.

>> By unsharing
>> cgroupns, the tasks can only confine themselves further under its
>> cgroupns-root. As long as it cannot escape that hierarchy, it should
>> be fine.
>
> But they can also *lock* their hierarchy.
>

But locking the tasks inside the hierarchy is really what cgroupns
feature is trying to provide. I understand that this is a change in
expectation, but with unified hierarchy, there are already
restrictions on where tasks can be moved (only to leaf cgroups). With
cgroup namespaces, this becomes: "only to leaf cgroups within task's
cgroupns".

>> In my experience, there is seldom a need to move tasks out of their
>> cgroup. At most, we create a sub-cgroup and move the task there (which
>> is allowed in their cgroupns). Even for a cgroup manager, I can't
>> think of a case where it will be useful to move a task from one cgroup
>> hierarchy to another. Such move seems overly complicated (even without
>> cgroup namespaces). The cgroup manager can just modify the settings of
>> the task's cgroup as needed or simply kill & restart the task in a new
>> container.
>>
>
> I do this all the time. Maybe my new systemd overlords will make me
> stop doing it, at which point my current production setup will blow
> up.
>

[shudder]
I am surprised that this even works correctly.

Either way, may be checking cgroup directory permissions will work for
you? i.e., if you "chown" a cgroup directory to the user, it should be
OK if the user's task unshares cgroupns under that cgroup and you
don't care about moving tasks from under that cgroup. Without
ownership of the cgroup directory, creation of cgroupns will be
disallowed. What do you think?

> --Andy

--
Aditya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andy Lutomirski: "Re: [PATCH] x86, TSC: Add a software TSC offset"
Previous message: Thomas Gleixner: "Re: [PATCH] x86, TSC: Add a software TSC offset"
In reply to: Andy Lutomirski: "Re: [PATCH 5/5] cgroup: introduce cgroup namespaces"
Next in thread: Andy Lutomirski: "Re: [PATCH 5/5] cgroup: introduce cgroup namespaces"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]